rivicore/module-semantiq
Composer 安装命令:
composer require rivicore/module-semantiq
包简介
Semantic vector search for Adobe Commerce
README 文档
README
SemantiQ replaces Adobe Commerce's default keyword search with semantic vector search. Instead of matching keywords, it understands the meaning of a search query and returns products and CMS pages that are conceptually relevant — even when the exact words do not appear in the product name or description.
When a shopper types "something warm for winter camping", SemantiQ finds sleeping bags, thermal jackets, and base layers rather than only items literally containing those words.
Screenshots
Search results — "carry things to the gym"
Search results — "something for my daily commute"
Table of Contents
- Requirements
- Installation
- How It Works
- Configuration Reference
- External Service Setup
- Indexing
- Excluding Entities from the Index
- RAG Context Block (Frontend)
- Recommended Combinations
- Troubleshooting
Requirements
| Requirement | Version |
|---|---|
| Adobe Commerce / Magento Open Source | 2.4.6+ |
| PHP | 8.2, 8.3, or 8.4 |
| OpenSearch or Elasticsearch | 2.x (already required by Magento) |
| Guzzle HTTP | 7.x (bundled with Magento) |
Optional (depending on chosen backend):
- AWS account with Bedrock and/or OpenSearch Service
- OpenAI API account
- Anthropic API account
- ChromaDB server
- Ollama installation
Installation
Via Composer (recommended)
composer require rivicore/module-semantiq
If the package is not yet on Packagist, add the VCS repository first:
composer config repositories.rivicore-semantiq vcs https://github.com/bgkavinga/rivicore-semantiq composer require rivicore/module-semantiq
Enable and set up
# 1. Enable the module bin/magento module:enable Rivicore_SemantiQ # 2. Apply database schema and data patches bin/magento setup:upgrade # 3. Regenerate DI container bin/magento setup:di:compile # 4. Flush cache bin/magento cache:flush
After installation, go to Stores → Configuration → Rivicore → SemantiQ Semantic Search to configure the module.
How It Works
Product/CMS text ──▶ Embedding Provider ──▶ float[] vector ──▶ Vector Store
│
Shopper query ────▶ Embedding Provider ──▶ float[] vector ──┐ │
└──▶ kNN search ──▶ Ranked product IDs
│
(optional RAG) │
LLM Provider ◀────────────┘
│
Context summary ──▶ Frontend event
- Indexing — product and CMS page text is converted to numeric vectors by the embedding provider and stored in the vector store.
- Search — when a shopper submits a query, it is embedded using the same provider, and the vector store returns the nearest neighbours by cosine similarity.
- RAG (optional) — the top results are passed to an LLM, which generates a short contextual summary dispatched as a Magento event for display in a frontend block.
The existing Elasticsearch/OpenSearch search is preserved as an automatic fallback if semantic search is disabled or encounters an error.
Configuration Reference
Navigate to Stores → Configuration → Rivicore → SemantiQ Semantic Search.
General Settings
Enable SemantiQ
- Type: Yes/No
- Default: No
- What it does: Master switch. When set to Yes, all storefront search queries are routed through the vector search pipeline. When set to No, the standard Magento Elasticsearch search is used.
- Note: Set up the vector store and run a full reindex before enabling this. Enabling with an empty index returns no results.
Index Products
- Type: Yes/No
- Default: Yes
- What it does: Controls whether products are included in the vector index. Disable this if you only want to use SemantiQ for CMS page search.
Index CMS Pages
- Type: Yes/No
- Default: Yes
- What it does: Controls whether CMS pages are included in the vector index. CMS pages appear alongside products in search results.
Enable RAG (LLM Context)
- Type: Yes/No
- Default: No
- What it does: When enabled, the top vector search results are sent to the configured LLM provider, which produces a short contextual summary. The summary is dispatched via the Magento event
rivicore_semantiq_rag_context_readyfor use in a frontend block. This does not affect the ranked product results — it only generates an optional explanatory text. - Cost warning: Each search query incurs one additional LLM API call. At high traffic volumes this can be expensive. Consider enabling only for specific pages via a custom observer.
Product Attributes to Index
- Type: Multi-select
- Default:
name,description - What it does: The selected product attribute values are concatenated into a single text string, which is then embedded. Adding more attributes gives the model richer context but also increases the text length sent to the embedding provider (and therefore cost/latency for token-based APIs like OpenAI).
- Recommended attributes:
name,description,short_description,manufacturer,color,size - Scope: Global (per Magento installation, not per store view)
Max Search Results
- Type: Integer
- Default: 20
- What it does: The maximum number of products returned by a vector search. This is the value sent to the vector store as the
k(number of nearest neighbours) parameter.
Vector Store Backend
The vector store holds the numeric embeddings and performs the k-nearest-neighbour (kNN) search at query time.
Navigate to the Vector Store Backend group.
Backend
- Type: Select
- Options:
OpenSearch (built-in),ChromaDB,AWS Bedrock Knowledge Base - Default:
OpenSearch (built-in) - What it does: Selects where vectors are stored and searched. See External Service Setup for how to configure each backend.
OpenSearch fields (shown when Backend = OpenSearch)
OpenSearch Index Name
- Type: Text
- Default:
semantiq_vectors - What it does: The name of the dedicated OpenSearch index used to store vectors. This is separate from Magento's own catalog search index (
magento2_product_1, etc.) to avoid conflicts. You can change this if the default name conflicts with an existing index.
ChromaDB fields (shown when Backend = ChromaDB)
ChromaDB URL
- Type: Text
- Example:
http://localhost:8000 - What it does: The base URL of your ChromaDB HTTP server. Must be reachable from the PHP/Magento server. See ChromaDB setup for installation instructions.
ChromaDB Collection Name
- Type: Text
- Default:
semantiq - What it does: The name of the ChromaDB collection used to store vectors. Created automatically on first index run if it does not already exist.
AWS Bedrock Knowledge Base fields (shown when Backend = AWS Bedrock Knowledge Base)
AWS Region
- Type: Text
- Example:
us-east-1 - What it does: The AWS region where the Bedrock Knowledge Base is provisioned. Must match the region used when creating the Knowledge Base in the AWS console.
Knowledge Base ID
- Type: Text
- Example:
ABCDEF1234 - What it does: The unique identifier of the Bedrock Knowledge Base. Found in the AWS console under Amazon Bedrock → Knowledge Bases.
AWS Access Key ID / AWS Secret Access Key
- Type: Password (encrypted at rest)
- What it does: IAM credentials with permission to call
bedrock:Retrieve,bedrock:IngestKnowledgeBaseDocuments, andbedrock:DeleteKnowledgeBaseDocumentson the target Knowledge Base. See AWS Bedrock Knowledge Base setup for the required IAM policy.
Embedding Provider
The embedding provider converts text into numeric vectors. The same provider must be used for both indexing and search — changing the provider after indexing requires a full reindex.
Navigate to the Embedding Provider group.
Provider
- Type: Select
- Options:
OpenSearch ML Commons (built-in),OpenAI,AWS Bedrock (Titan),Ollama (local),Anthropic Claude (experimental) - Default:
OpenAI - Important: All documents in the vector store must use vectors from the same provider and model. If you switch providers, you must run
bin/magento indexer:reindex rivicore_semantiqto rebuild the entire index with the new vectors before re-enabling the module.
OpenSearch ML Commons fields (shown when Provider = OpenSearch ML Commons)
ML Model ID
- Type: Text
- Example:
bQ1J8ooBpBj3wT4HVemV - What it does: The ID of the text-embedding model deployed in your OpenSearch cluster via ML Commons. The model must be in
DEPLOYEDstatus. See OpenSearch ML Commons setup for how to deploy a model and find its ID.
Embedding Dimension
- Type: Integer
- Default:
768 - What it does: Must match the output dimension of the deployed model exactly. Common values:
- SBERT / all-MiniLM / BGE-small:
384 - most SBERT-large / BGE-base:
768 - OpenAI-compatible / BGE-large:
1024or1536
- SBERT / all-MiniLM / BGE-small:
- Note: If this does not match the actual model output, the OpenSearch index mapping will reject vectors and indexing will fail.
OpenAI fields (shown when Provider = OpenAI)
OpenAI API Key
- Type: Password (encrypted at rest)
- What it does: Your OpenAI API secret key (
sk-...). See OpenAI setup.
OpenAI Embedding Model
- Type: Text
- Default:
text-embedding-3-small - Options:
Model Dimension Best for text-embedding-3-small1536 Best price/quality ratio for most use cases text-embedding-3-large3072 Highest accuracy, higher cost text-embedding-ada-0021536 Legacy, lower performance than v3
AWS Bedrock (Titan) fields (shown when Provider = AWS Bedrock)
AWS Region
- Type: Text
- Example:
us-east-1
Bedrock Embedding Model ID
- Type: Text
- Default:
amazon.titan-embed-text-v2:0 - Options:
Model ID Dimension Notes amazon.titan-embed-text-v2:01024 Recommended — supports up to 8,192 tokens amazon.titan-embed-text-v11536 Older model, max 8,192 tokens cohere.embed-english-v31024 English-only, strong performance cohere.embed-multilingual-v31024 100+ languages
AWS Access Key ID / AWS Secret Access Key
- Type: Password (encrypted at rest)
- Required IAM permissions:
bedrock:InvokeModelon the target model ARN.
Ollama fields (shown when Provider = Ollama)
Ollama Base URL
- Type: Text
- Default:
http://localhost:11434 - What it does: Base URL of the Ollama HTTP server. The PHP/Magento server must be able to reach this URL. If Ollama runs on a different machine, replace
localhostwith the server's IP or hostname.
Ollama Embedding Model
- Type: Text
- Default:
nomic-embed-text - Options:
Model Dimension Notes nomic-embed-text768 Good general-purpose model, small footprint mxbai-embed-large1024 Higher accuracy, requires more RAM bge-m31024 Multilingual, strong cross-lingual retrieval all-minilm384 Very fast, lower quality
Anthropic fields (shown when Provider = Anthropic Claude) — Experimental
Anthropic API Key
- Type: Password (encrypted at rest)
Anthropic Model
- Type: Text
- Default:
claude-haiku-4-5-20251001 - Warning: Anthropic does not provide a dedicated embedding API. This provider sends the text to a Claude model with a structured prompt asking for a JSON float array. This is significantly slower, more expensive, and less accurate than a dedicated embedding model. Use OpenAI, Bedrock, or Ollama for production deployments.
LLM Provider (RAG)
The LLM provider is only used when Enable RAG is set to Yes. It takes the top vector search results and the original query and produces a short contextual summary.
All configuration fields mirror the Embedding Provider section — provider selection plus credentials and model name for the chosen provider.
Shared RAG Options
Max Context Documents
- Type: Integer
- Default:
5 - What it does: How many of the top-ranked search results are included in the LLM context window. Higher values give the LLM more information but increase token usage and latency.
RAG Prompt Template
- Type: Textarea
- Placeholders:
{{query}}(the shopper's search query),{{context}}(numbered list of top products/pages) - Default:
You are a helpful shopping assistant. Based on the following product information, provide a brief and helpful recommendation for the search query "{{query}}". Products: {{context}} Provide a 2-3 sentence summary highlighting the most relevant products for this search. - What it does: The full prompt sent to the LLM. Customise it to match your brand voice or to focus on specific aspects of the products.
External Service Setup
1. OpenSearch ML Commons (built-in embeddings)
OpenSearch ML Commons allows you to deploy a text-embedding model directly inside your OpenSearch cluster — no external API calls required at search time.
Prerequisites
- OpenSearch 2.4 or later with the ML Commons plugin enabled (it is enabled by default on AWS OpenSearch Service and on the OpenSearch Docker images).
- The cluster must have at least one data node with enough RAM for the model (~1–4 GB depending on model size).
Step 1 — Enable ML on all nodes (self-hosted only)
Add this to opensearch.yml on every node, then restart:
plugins.ml_commons.only_run_on_ml_node: false plugins.ml_commons.native_memory_threshold: 99
For AWS OpenSearch Service, these settings are already configured.
Step 2 — Register the model
SemantiQ works with any text-embedding model that can be deployed via ML Commons. The recommended starting point is Hugging Face sentence-transformers/all-MiniLM-L6-v2 (768 dimensions), which is pre-registered in OpenSearch's model registry.
Option A: Use the pre-built model registry (recommended)
# Register a pre-built sentence-transformers model POST /_plugins/_ml/models/_register { "name": "huggingface/sentence-transformers/all-MiniLM-L6-v2", "version": "1.0.1", "model_format": "TORCH_SCRIPT" }
The response contains a task_id. Check the task status:
GET /_plugins/_ml/tasks/<task_id>
When state is COMPLETED, the response includes the model_id. Copy this value — you will need it in the SemantiQ admin config.
Option B: Register a custom model from a URL
POST /_plugins/_ml/models/_register
{
"name": "my-custom-embedding-model",
"version": "1.0.0",
"model_format": "TORCH_SCRIPT",
"model_task_type": "TEXT_EMBEDDING",
"model_config": {
"model_type": "bert",
"embedding_dimension": 768,
"framework_type": "sentence_transformers"
},
"url": "https://your-server.com/model.zip"
}
Step 3 — Deploy the model
POST /_plugins/_ml/models/<model_id>/_deploy
Check deployment status:
GET /_plugins/_ml/tasks/<deploy_task_id>
Wait until state is COMPLETED.
Step 4 — Verify the model is working
POST /_plugins/_ml/models/<model_id>/predict { "text_docs": ["test product description"] }
The response should contain an inference_results array with a data field of floats. Count the floats to confirm the embedding dimension.
Step 5 — Configure SemantiQ
- Embedding Provider →
OpenSearch ML Commons (built-in) - ML Model ID → the
model_idfrom step 2 - Embedding Dimension → the dimension confirmed in step 4 (e.g.
768)
Finding an existing model ID
GET /_plugins/_ml/models/_search
{
"query": { "match_all": {} },
"size": 20
}
Look for models with "model_state": "DEPLOYED".
2. OpenSearch as Vector Store
When Backend = OpenSearch (built-in), SemantiQ creates a dedicated index in the same OpenSearch cluster that Magento already uses. No additional server is required.
Prerequisites
- The k-NN plugin must be installed. It is included by default in OpenSearch 1.0+ and AWS OpenSearch Service.
Verify the plugin is present:
GET /_cat/plugins?v&h=name,component,version
Look for a line containing opensearch-knn.
What SemantiQ creates automatically
When you run the indexer for the first time, SemantiQ creates an index named semantiq_vectors (or whatever you configured in OpenSearch Index Name) with the following mapping:
{
"settings": {
"index.knn": true
},
"mappings": {
"properties": {
"vector": { "type": "knn_vector", "dimension": <your_dimension> },
"entity_type": { "type": "keyword" },
"entity_id": { "type": "integer" },
"store_id": { "type": "integer" },
"payload": { "type": "object", "enabled": false }
}
}
}
No manual index creation is needed. If the index already exists with a different mapping, drop it first:
DELETE /semantiq_vectors
Then rerun the Magento indexer.
3. ChromaDB
ChromaDB is an open-source, self-hosted vector database. It is the simplest option if you prefer not to use the OpenSearch cluster for vector storage.
Option A — Docker (recommended)
docker run -d \ --name chromadb \ -p 8000:8000 \ -v chroma-data:/chroma/chroma \ chromadb/chroma:latest
Verify it is running:
curl http://localhost:8000/api/v1/heartbeat
# Expected: {"nanosecond heartbeat": <timestamp>}
Option B — pip
pip install chromadb chroma run --host 0.0.0.0 --port 8000 --path ./chroma-data
Option C — Docker Compose
version: '3.8' services: chromadb: image: chromadb/chroma:latest ports: - "8000:8000" volumes: - chroma-data:/chroma/chroma environment: - CHROMA_SERVER_HTTP_PORT=8000 volumes: chroma-data:
docker compose up -d
Networking
The ChromaDB server must be reachable from the machine running PHP/Magento. If Magento runs in Docker, use the container name as the hostname, or use the host machine's IP address.
Configure SemantiQ
- Backend →
ChromaDB - ChromaDB URL →
http://chromadb:8000(Docker) orhttp://192.168.x.x:8000(remote server) orhttp://localhost:8000(same machine) - ChromaDB Collection Name →
semantiq(or any name — created automatically)
Persistence
Ensure the Docker volume (chroma-data) persists across container restarts. Without a volume, the vector index is lost every time the container is restarted.
4. AWS Bedrock Knowledge Base
AWS Bedrock Knowledge Bases is a fully managed RAG and vector search service. Bedrock handles chunking, embedding, and vector storage automatically. SemantiQ uses it in "ingest + retrieve" mode, bypassing Bedrock's built-in RAG generation in favour of its own LLM provider.
Step 1 — Create a Knowledge Base in the AWS Console
- Open the AWS Console → Amazon Bedrock → Knowledge Bases → Create knowledge base.
- Name: e.g.
semantiq-products - IAM role: Let AWS create a new role, or use an existing one.
- Data source: Choose Inline (SemantiQ pushes documents directly via the API, no S3 bucket needed).
- Embedding model: Choose
Amazon Titan Text Embeddings V2(or any supported model — note the dimension). - Vector store: Choose
Amazon OpenSearch Serverless(AWS-managed) or bring your own OpenSearch/Pinecone/RDS Aurora cluster. - Complete creation and note the Knowledge Base ID (format:
ABCDEF1234).
Step 2 — Create an IAM User with the required permissions
In IAM → Users → Create user:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"bedrock:Retrieve",
"bedrock:IngestKnowledgeBaseDocuments",
"bedrock:DeleteKnowledgeBaseDocuments",
"bedrock:GetKnowledgeBase"
],
"Resource": "arn:aws:bedrock:<region>:<account-id>:knowledge-base/<knowledge-base-id>"
}
]
}
Create an Access Key for this user and note the Access Key ID and Secret Access Key.
Step 3 — Configure SemantiQ
- Backend →
AWS Bedrock Knowledge Base - AWS Region → the region where the Knowledge Base was created (e.g.
us-east-1) - Knowledge Base ID → the ID from step 1 (e.g.
ABCDEF1234) - AWS Access Key ID / AWS Secret Access Key → from step 2
Step 4 — Install the AWS SDK
The AWS SDK is not bundled with Magento. Install it:
composer require aws/aws-sdk-php
5. OpenAI
Step 1 — Create an API key
- Go to platform.openai.com/api-keys.
- Click Create new secret key.
- Give it a name (e.g.
semantiq-magento) and click Create secret key. - Copy the key immediately — it is only shown once.
Step 2 — Set billing limits (recommended)
- In the OpenAI dashboard, go to Settings → Billing → Usage limits.
- Set a monthly budget to prevent unexpected charges. Embedding calls are inexpensive ($0.02–$0.13 per million tokens depending on model), but a large catalog with frequent reindexing can add up.
Step 3 — Configure SemantiQ
- Embedding Provider →
OpenAI - OpenAI API Key → the key from step 1
- OpenAI Embedding Model →
text-embedding-3-small(recommended) ortext-embedding-3-large
Cost estimate
text-embedding-3-small costs $0.02 per million tokens. A typical product description with name is ~100 tokens. A catalog of 10,000 products ≈ 1 million tokens ≈ $0.02 per full reindex. Daily incremental updates cost a fraction of this.
6. AWS Bedrock Embeddings (Titan)
Use this when you prefer to keep data within AWS and do not want to send product information to a third party.
Step 1 — Enable model access in AWS Bedrock
- Open the AWS Console → Amazon Bedrock → Model access.
- Click Manage model access.
- Enable Amazon Titan Text Embeddings V2 (and any other models you plan to use).
- Click Save changes. Access is granted immediately for Titan models.
Step 2 — Create an IAM User
In IAM → Users → Create user:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": ["bedrock:InvokeModel"],
"Resource": [
"arn:aws:bedrock:*::foundation-model/amazon.titan-embed-text-v2:0",
"arn:aws:bedrock:*::foundation-model/amazon.titan-embed-text-v1",
"arn:aws:bedrock:*::foundation-model/cohere.embed-english-v3",
"arn:aws:bedrock:*::foundation-model/cohere.embed-multilingual-v3"
]
}
]
}
Create an Access Key for the user.
Step 3 — Install the AWS SDK
composer require aws/aws-sdk-php
Step 4 — Configure SemantiQ
- Embedding Provider →
AWS Bedrock (Titan) - AWS Region → e.g.
us-east-1(must be a region where Bedrock is available) - Bedrock Embedding Model ID →
amazon.titan-embed-text-v2:0(recommended) - AWS Access Key ID / AWS Secret Access Key → from step 2
Input limits
| Model | Max input tokens |
|---|---|
amazon.titan-embed-text-v2:0 |
8,192 |
amazon.titan-embed-text-v1 |
8,192 |
cohere.embed-english-v3 |
512 |
If product descriptions exceed the token limit, the text is silently truncated by the API. Configure Product Attributes to Index to keep concatenated text under the limit.
7. Ollama (local)
Ollama runs open-source embedding models completely locally. No internet connection, no API keys, no per-token cost. Suitable for development, on-premise deployments, or privacy-sensitive catalogs.
Step 1 — Install Ollama
macOS:
brew install ollama
# or download from https://ollama.com/download
Linux:
curl -fsSL https://ollama.com/install.sh | sh
Windows: Download the installer from ollama.com/download.
Step 2 — Pull an embedding model
# Recommended: nomic-embed-text (768 dimensions, ~274 MB) ollama pull nomic-embed-text # Higher quality (1024 dimensions, ~670 MB) ollama pull mxbai-embed-large # Multilingual (1024 dimensions, ~1.2 GB) ollama pull bge-m3
Step 3 — Start the Ollama server
ollama serve
The server starts on http://localhost:11434 by default. To listen on all interfaces (so other machines can reach it):
OLLAMA_HOST=0.0.0.0 ollama serve
Verify it is running:
curl http://localhost:11434/api/embeddings \
-d '{"model": "nomic-embed-text", "prompt": "test"}'
The response should contain an embedding field with 768 floats.
Step 4 — Run as a system service (Linux)
sudo systemctl enable ollama
sudo systemctl start ollama
Step 5 — Configure SemantiQ
- Embedding Provider →
Ollama (local) - Ollama Base URL →
http://localhost:11434(or the server's IP if running remotely) - Ollama Embedding Model →
nomic-embed-text
RAM requirements
| Model | Disk | RAM needed |
|---|---|---|
all-minilm |
~46 MB | ~256 MB |
nomic-embed-text |
~274 MB | ~512 MB |
mxbai-embed-large |
~670 MB | ~1 GB |
bge-m3 |
~1.2 GB | ~2 GB |
8. Anthropic Claude
Step 1 — Create an API key
- Go to console.anthropic.com.
- Navigate to API Keys → Create Key.
- Name it (e.g.
semantiq-magento) and copy the key.
Step 2 — Add credits
Anthropic requires prepaid credits. Go to Billing → Add credits and add at least $5 to start.
Step 3 — Configure SemantiQ
For LLM / RAG:
- LLM Provider →
Anthropic Claude - Anthropic API Key → key from step 1
- Anthropic Model →
claude-haiku-4-5-20251001(fastest/cheapest) orclaude-sonnet-4-6(higher quality)
For Embedding (experimental only — not recommended for production):
- Embedding Provider →
Anthropic Claude (experimental) - Anthropic API Key → same key
- Anthropic Model →
claude-haiku-4-5-20251001
Why embedding via Anthropic is experimental: Anthropic does not offer a dedicated vector embedding API. SemantiQ works around this by prompting Claude to output a JSON float array, but this approach is slow (~1–3 seconds per product), expensive (chat token pricing vs. embedding pricing), and produces inconsistent vector quality. Use a dedicated embedding provider instead.
Indexing
The SemantiQ indexer converts product and CMS page text into vectors and stores them in the configured vector store.
Full reindex
Run this after initial setup or after changing the embedding provider or indexed attributes:
bin/magento indexer:reindex rivicore_semantiq
This re-embeds every enabled, visible, non-excluded product across all store views, and every active non-excluded CMS page. For a catalog of 10,000 products, expect 5–30 minutes depending on the embedding provider's latency.
Incremental reindex (automatic)
SemantiQ subscribes to changes in the following database tables via Magento's MView (materialised view) system:
catalog_product_entityand EAV attribute tables — triggers on product savescms_page— triggers on CMS page saves
When a product or CMS page is saved in the admin, it is added to the reindex backlog and processed by the next cron run.
To process the backlog immediately:
bin/magento indexer:reindex rivicore_semantiq
Indexer status
bin/magento indexer:status rivicore_semantiq
Check indexed document count
# MySQL SELECT entity_type, store_id, COUNT(*) as count FROM rivicore_semantiq_index GROUP BY entity_type, store_id;
Excluding Entities from the Index
Excluding a product
- Open the product in Catalog → Products → Edit.
- Scroll to the Search Engine Optimization section.
- Set Exclude from SemantiQ Index to Yes.
- Save the product.
The product is removed from the vector store on the next reindex.
Excluding a CMS page
- Open the page in Content → Pages → Edit.
- Scroll to the Design section.
- Set Exclude from SemantiQ Index to Yes.
- Save the page.
Disabling indexing for all products or all CMS pages
Use the global toggles in General Settings:
- Index Products →
No— no products will be indexed (or reindexed) - Index CMS Pages →
No— no CMS pages will be indexed (or reindexed)
RAG Context Block (Frontend)
When Enable RAG is set to Yes, SemantiQ dispatches a Magento event after each successful search:
rivicore_semantiq_rag_context_ready
Event data:
query— the original search query stringcontext— the LLM-generated summary paragraphresults— array ofVectorSearchResultInterfaceobjects
To display the context on the search results page, create an observer in your theme or custom module:
<!-- etc/events.xml --> <event name="rivicore_semantiq_rag_context_ready"> <observer name="my_theme_semantiq_context" instance="MyVendor\MyTheme\Observer\SemantiQContextObserver"/> </event>
// Observer/SemantiQContextObserver.php class SemantiQContextObserver implements ObserverInterface { public function execute(Observer $observer): void { $context = $observer->getData('context'); // Store in registry, session, or cache for the block to read $this->registry->register('semantiq_rag_context', $context); } }
Recommended Combinations
| Use case | Vector Store | Embedding | LLM (RAG) |
|---|---|---|---|
| Quickest setup, no extra infra | OpenSearch (built-in) | OpenSearch ML Commons | OpenAI |
| Best accuracy, affordable | OpenSearch | OpenAI text-embedding-3-small |
OpenAI gpt-4o-mini |
| All data stays in AWS | Bedrock KB | Bedrock Titan V2 | Bedrock Claude Haiku |
| Fully on-premise, no external calls | OpenSearch | Ollama nomic-embed-text |
Ollama llama3 |
| Development / local testing | ChromaDB | Ollama nomic-embed-text |
Ollama llama3 |
Troubleshooting
Search returns no results after enabling
- Check the indexer ran successfully:
bin/magento indexer:status rivicore_semantiq - Check the index has documents:
SELECT COUNT(*) FROM rivicore_semantiq_index; - Check
var/log/system.logandvar/log/exception.logfor embedding or vector store errors. - Verify the embedding provider credentials are correct by checking
var/log/system.logforSemantiQ:prefixed messages.
knn_vector mapping error on OpenSearch
The k-NN plugin is not installed or enabled. Verify:
GET /_cat/plugins?v
If opensearch-knn is missing, install it (self-hosted) or use AWS OpenSearch Service which includes it by default.
Dimension mismatch error
The Embedding Dimension config value does not match the model's actual output. Check the model's documentation or run a test predict call (see OpenSearch ML Commons step 4). Drop the OpenSearch index and reindex:
# Drop the vector index curl -X DELETE http://localhost:9200/semantiq_vectors # Reindex bin/magento indexer:reindex rivicore_semantiq
Switching embedding providers
After changing the provider (or model), the existing vectors are incompatible. You must:
- Set the correct Embedding Dimension for the new provider.
- Disable SemantiQ (
General Settings → Enable SemantiQ → No). - Delete the existing vector index (OpenSearch index, ChromaDB collection, or Bedrock KB documents).
- Run a full reindex:
bin/magento indexer:reindex rivicore_semantiq - Re-enable SemantiQ.
AWS SDK not found (Bedrock)
composer require aws/aws-sdk-php bin/magento setup:di:compile bin/magento cache:flush
Ollama connection refused
Ensure Ollama is running (ollama serve) and the model is pulled (ollama pull nomic-embed-text). If Magento runs in Docker, localhost in the config refers to the container — use the host machine's IP or Docker network hostname instead.
Search falls back to Elasticsearch silently
All errors in the vector search pipeline are caught and logged, and the request falls through to the standard Elasticsearch search. Check var/log/system.log for SemantiQ: prefixed error lines to identify the root cause.
统计信息
- 总下载量: 0
- 月度下载量: 0
- 日度下载量: 0
- 收藏数: 0
- 点击次数: 1
- 依赖项目数: 0
- 推荐数: 0
其他信息
- 授权协议: Unknown
- 更新时间: 2026-06-16




