Hybrid RAG¶

Hybrid RAG (Retrieval-Augmented Generation) combines the conceptual power of Vector Search with the exact precision of BM25 Keyword Search. This approach minimizes "semantic drift" and ensures that specific terms (like error codes or product names) are always retrieved correctly.

Objective¶

Build a technical knowledge base and implement a context retrieval function that balances keyword and semantic signals for an LLM prompt.

Prerequisites¶

Deeplake SDK: pip install deeplake (Python SDK tab)
curl, jq, and a terminal (REST API tab)
An OpenRouter API key for embeddings.
A Deeplake API token.

Set credentials first

export DEEPLAKE_API_KEY="your-token-here"
export DEEPLAKE_WORKSPACE="your-workspace"  # optional, defaults to "default"

Complete Code¶

Python SDKREST API

import os
import requests
from deeplake import Client

# --- Embedding via OpenRouter (qwen/qwen3-embedding-8b, ctx 32K) ---
OPENROUTER_API_KEY = os.environ["OPENROUTER_API_KEY"]

def embed(texts):
    """Generate embeddings using Qwen3 Embedding 8B via OpenRouter."""
    res = requests.post(
        "https://openrouter.ai/api/v1/embeddings",
        headers={"Authorization": f"Bearer {OPENROUTER_API_KEY}"},
        json={"model": "qwen/qwen3-embedding-8b", "input": texts},
    )
    return [item["embedding"] for item in res.json()["data"]]

# 1. Setup Client
client = Client()

# 2. Ingest structured knowledge with embeddings
docs = [
    {"title": "Auth guide", "content": "JWT tokens expire after 24h.", "src": "auth-docs"},
    {"title": "Rate limits", "content": "Limit is 1000 req/min.", "src": "api-docs"},
    {"title": "Security", "content": "Always use TLS 1.3 for API calls.", "src": "sec-docs"},
]

# Pre-compute vectors using Qwen3 Embedding via OpenRouter
embeddings = embed([d["content"] for d in docs])

print(f"Ingesting {len(docs)} knowledge segments...")
client.ingest("knowledge_base", {
    "title": [d["title"] for d in docs],
    "content": [d["content"] for d in docs],
    "source": [d["src"] for d in docs],
    "embedding": embeddings,
})

# 3. Hybrid RAG Context Retrieval
# We combine precision (Keywords/BM25) and recall (Semantic/Vector)
def retrieve_context(question, top_k=3, vector_weight=0.5):
    query_emb = embed([question])[0]
    emb_pg = "{" + ",".join(str(x) for x in query_emb) + "}"
    text_weight = 1.0 - vector_weight

    # Balance keywords and semantics with tunable weights
    return (
        client.table("knowledge_base")
            .select("content", "source",
                    f"(embedding, content)::deeplake_hybrid_record <#> deeplake_hybrid_record('{emb_pg}'::float4[], '{question}', {vector_weight}, {text_weight}) AS score")
            .order_by("score DESC")
            .limit(top_k)
            .execute()
    )

# 4. Usage: Building the prompt context
question = "how long do JWT tokens last?"
results = retrieve_context(question, vector_weight=0.7)

prompt_context = "\n".join([f"- {r['content']} (Source: {r['source']})" for r in results])
print(f"Context retrieved:\n{prompt_context}")

# Requires: export DEEPLAKE_API_KEY="..." (see quickstart)
# Requires: export DEEPLAKE_ORG_ID="your-org-id"
API_URL="https://api.deeplake.ai"
# Requires: export OPENROUTER_API_KEY="..."

# --- Helper: get embeddings via OpenRouter (qwen/qwen3-embedding-8b) ---
embed() {
  curl -s "https://openrouter.ai/api/v1/embeddings" \
    -H "Authorization: Bearer $OPENROUTER_API_KEY" \
    -H "Content-Type: application/json" \
    -d "{\"model\": \"qwen/qwen3-embedding-8b\", \"input\": $1}"
}

# 1. Create knowledge base table
curl -s -X POST "$API_URL/workspaces/$DEEPLAKE_WORKSPACE/tables/query" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $DEEPLAKE_API_KEY" \
  -H "X-Activeloop-Org-Id: $DEEPLAKE_ORG_ID" \
  -d '{
    "query": "CREATE TABLE IF NOT EXISTS \"'$DEEPLAKE_WORKSPACE'\".\"knowledge_base\" (id BIGSERIAL PRIMARY KEY, title TEXT, content TEXT, source TEXT, embedding FLOAT4[]) USING deeplake"
  }'

# 2. Get embeddings for all documents
EMBEDDINGS=$(embed '["JWT tokens expire after 24h.", "Limit is 1000 req/min.", "Always use TLS 1.3 for API calls."]')
EMB_0=$(echo "$EMBEDDINGS" | jq -c '.data[0].embedding' | tr '[]' '{}')
EMB_1=$(echo "$EMBEDDINGS" | jq -c '.data[1].embedding' | tr '[]' '{}')
EMB_2=$(echo "$EMBEDDINGS" | jq -c '.data[2].embedding' | tr '[]' '{}')

# 3. Insert documents with embeddings
curl -s -X POST "$API_URL/workspaces/$DEEPLAKE_WORKSPACE/tables/query" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $DEEPLAKE_API_KEY" \
  -H "X-Activeloop-Org-Id: $DEEPLAKE_ORG_ID" \
  -d '{
    "query": "INSERT INTO \"'$DEEPLAKE_WORKSPACE'\".\"knowledge_base\" (title, content, source, embedding) VALUES ($1, $2, $3, $4::float4[]), ($5, $6, $7, $8::float4[]), ($9, $10, $11, $12::float4[])",
    "params": [
      "Auth guide", "JWT tokens expire after 24h.", "auth-docs", "'"$EMB_0"'",
      "Rate limits", "Limit is 1000 req/min.", "api-docs", "'"$EMB_1"'",
      "Security", "Always use TLS 1.3 for API calls.", "sec-docs", "'"$EMB_2"'"
    ]
  }'

# 4. Hybrid RAG Search
QUESTION="how long do JWT tokens last?"
Q_EMB=$(embed "[\"$QUESTION\"]" | jq -c '.data[0].embedding' | tr '[]' '{}')


curl -s -X POST "$API_URL/workspaces/$DEEPLAKE_WORKSPACE/tables/query" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $DEEPLAKE_API_KEY" \
  -H "X-Activeloop-Org-Id: $DEEPLAKE_ORG_ID" \
  -d '{
    "query": "SELECT content, source, (embedding, content) <#> deeplake_hybrid_record($1::float4[], $2, 0.7, 0.3) AS score FROM \"'$DEEPLAKE_WORKSPACE'\".\"knowledge_base\" ORDER BY score DESC LIMIT 3",
    "params": [
      "'"$Q_EMB"'",
      "how long do JWT tokens last?"
    ]
  }'

Step-by-Step Breakdown¶

1. The Power of Hybrid Search¶

Standard vector search can return "semantically similar" but factually incorrect documents. By using the deeplake_hybrid_record type, Deeplake allows you to weight the importance of exact keyword matches versus overall meaning.

2. Tunable Weights¶

The vector_weight and text_weight parameters (summing to 1.0) let you tune your RAG system based on the query type: - Conceptual questions (e.g., "How does auth work?") → Higher vector weight (0.7+). - Exact technical lookups (e.g., "Error 429") → Higher BM25 weight (0.7+).

3. Native Postgres Compatibility¶

Deeplake's hybrid search is implemented as a native Postgres operator class. This means you can combine similarity ranking with standard WHERE clauses, grouping, or complex joins within the same query.

What to try next¶

Advanced Multimodal RAG: include images in your retrieval pipeline.
Agent Memory: use hybrid search for long-term agent persistence.
Search Guide: detailed weighting strategies.