Skip to content

Search

Deep Lake supports four search modes. All use the <#> operator. All go through the same REST SQL endpoint.

Setup

import requests

API_URL = "https://api.deeplake.ai"
TOKEN = "YOUR_TOKEN"
WORKSPACE = "YOUR_WORKSPACE"
TABLE = "documents"

headers = {
    "Authorization": f"Bearer {TOKEN}",
    "Content-Type": "application/json",
}

def query(sql):
    res = requests.post(
        f"{API_URL}/workspaces/{WORKSPACE}/tables/query",
        headers=headers,
        json={"query": sql},
    )
    return res.json()

Matches meaning, not keywords. Use when the query and the data don't share exact words.

Good for: "same bug, different error message", "scenes like this clip", "documents that imply this requirement."

Requires: a vector index on a FLOAT4[] column.

# Your embedding vector (from any encoder)
embedding = [0.1, 0.2, 0.3, ...]  # list of floats

emb_literal = "ARRAY[" + ",".join(str(v) for v in embedding) + "]::float4[]"

result = query(f"""
    SELECT title, content,
           embedding <#> {emb_literal} AS score
    FROM "{WORKSPACE}"."{TABLE}"
    ORDER BY score DESC
    LIMIT 10
""")

Matches exact words. Use when precise terms matter.

Good for: stack traces, function names, ticket IDs, error codes.

Requires: a BM25 index on a TEXT column.

result = query(f"""
    SELECT title, content,
           content <#> 'authentication timeout error' AS score
    FROM "{WORKSPACE}"."{TABLE}"
    ORDER BY score ASC
    LIMIT 10
""")

Note: BM25 scores are sorted ASC (lower is better match).

Combines vector and BM25 in a single query. Best default for most use cases.

Reduces "semantic drift" (vector-only problem) and "keyword brittleness" (BM25-only problem).

Requires: both a vector index and a BM25 index on the same table.

search_text = "fix authentication timeout"
embedding = [0.1, 0.2, 0.3, ...]  # embed the same text

emb_literal = "ARRAY[" + ",".join(str(v) for v in embedding) + "]::float4[]"

result = query(f"""
    SELECT title, content,
           (embedding, content) <#> deeplake_hybrid_record(
               {emb_literal},
               '{search_text}',
               0.5, 0.5
           ) AS score
    FROM "{WORKSPACE}"."{TABLE}"
    ORDER BY score ASC
    LIMIT 10
""")

The last two arguments (0.5, 0.5) are weights for vector and BM25 respectively. Adjust them based on your use case:

Weights Best for
0.7, 0.3 Conceptual queries where meaning matters more
0.5, 0.5 Balanced default
0.3, 0.7 Precise queries where exact keywords matter

Multi-vector search (MaxSim)

Uses a bag of embeddings per item instead of a single vector. Catches fine details.

Good for: a specific action inside a video, a small object inside an image, a key sentence inside a long document.

Requires: a vector index on a FLOAT4[][] column.

# Multi-vector embedding: list of lists (tokens x dim)
multi_embedding = [[0.1, 0.2, ...], [0.3, 0.4, ...], ...]

# Format as SQL literal
inner = ", ".join(
    "ARRAY[" + ",".join(str(v) for v in row) + "]"
    for row in multi_embedding
)
emb_literal = f"ARRAY[{inner}]::float4[][]"

VIDEO_TABLE = "video_chunks"

result = query(f"""
    SELECT title, file_id,
           embedding <#> {emb_literal} AS score
    FROM "{WORKSPACE}"."{VIDEO_TABLE}"
    ORDER BY score DESC
    LIMIT 10
""")

Multi-vector search uses MaxSim-style late interaction scoring: each query token is matched to the best-matching part of the item.

Filtering

You can combine search with SQL WHERE clauses. Filtering happens before ranking, reducing compute.

result = query(f"""
    SELECT title, content,
           embedding <#> {emb_literal} AS score
    FROM "{WORKSPACE}"."{TABLE}"
    WHERE metadata->>'source' = 'production'
      AND created_at > '2025-01-01'
    ORDER BY score DESC
    LIMIT 10
""")

This is a key advantage of SQL-first retrieval: structured filters and similarity search in one query.

Choosing the right mode

Mode Use when
Vector Query is conceptual, no shared keywords
BM25 Query contains exact identifiers or terms
Hybrid Default choice — covers both cases
Multi-vector Items are complex (long text, images, video) and details matter