Skip to content

Indexes

Indexes make search fast. You create them once on a column. Queries use them automatically.

Deeplake supports three index types, all created via USING deeplake_index.

Prerequisite: USING deeplake

Tables must be created with USING deeplake for indexes to work. Without the USING deeplake engine clause on the table, index creation will fail.

Setup

Set DEEPLAKE_API_KEY and DEEPLAKE_WORKSPACE as environment variables (see Quickstart).

Set credentials first

export DEEPLAKE_API_KEY="your-token-here"
export DEEPLAKE_WORKSPACE="your-workspace"  # optional, defaults to "default"
import os
from deeplake import Client

client = Client()
WORKSPACE = os.environ["DEEPLAKE_WORKSPACE"]
TABLE = "documents"
API_URL="https://api.deeplake.ai"
TABLE="documents"
export DEEPLAKE_ORG_ID="your-org-id"

Vector index

For similarity search on embedding columns. Deeplake supports two embedding formats:

Column type Algorithm How it works
FLOAT4[] Cosine similarity Single vector per row. Computes cosine distance between the query vector and each row's embedding. Best for text embeddings, image embeddings, or any model that produces one vector per item.
FLOAT4[][] MaxSim Bag of vectors per row (e.g. one vector per token/patch). For each query vector, finds the best-matching vector in the row, then sums those scores. Used by ColBERT-style late-interaction models for higher-quality retrieval.
client.query(f"""
    CREATE INDEX IF NOT EXISTS idx_docs_vec
    ON "{WORKSPACE}"."{TABLE}" USING deeplake_index (embedding DESC)
""")
curl -s -X POST "$API_URL/workspaces/$DEEPLAKE_WORKSPACE/tables/query" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $DEEPLAKE_API_KEY" \
  -H "X-Activeloop-Org-Id: $DEEPLAKE_ORG_ID" \
  -d '{
    "query": "CREATE INDEX IF NOT EXISTS idx_docs_vec ON \"YOUR_WORKSPACE\".\"documents\" USING deeplake_index (embedding DESC)"
  }'

Enables the <#> operator for vector similarity:

SELECT *, embedding <#> ARRAY[0.1, 0.2, ...]::float4[] AS score
FROM "my_workspace"."documents" ORDER BY score DESC LIMIT 10

BM25 index

For keyword search on text columns.

client.query(f"""
    CREATE INDEX IF NOT EXISTS idx_docs_bm25
    ON "{WORKSPACE}"."{TABLE}" USING deeplake_index (content)
    WITH (index_type = 'bm25')
""")
curl -s -X POST "$API_URL/workspaces/$DEEPLAKE_WORKSPACE/tables/query" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $DEEPLAKE_API_KEY" \
  -H "X-Activeloop-Org-Id: $DEEPLAKE_ORG_ID" \
  -d '{
    "query": "CREATE INDEX IF NOT EXISTS idx_docs_bm25 ON \"YOUR_WORKSPACE\".\"documents\" USING deeplake_index (content) WITH (index_type = '\''bm25'\'')"
  }'

Enables the <#> operator for text ranking:

SELECT *, content <#> 'authentication error' AS score
FROM "my_workspace"."documents" ORDER BY score DESC LIMIT 10

Exact text index

For fast exact string filtering.

client.query(f"""
    CREATE INDEX IF NOT EXISTS idx_docs_category
    ON "{WORKSPACE}"."{TABLE}" USING deeplake_index (category)
    WITH (index_type = 'exact_text')
""")
curl -s -X POST "$API_URL/workspaces/$DEEPLAKE_WORKSPACE/tables/query" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $DEEPLAKE_API_KEY" \
  -H "X-Activeloop-Org-Id: $DEEPLAKE_ORG_ID" \
  -d '{
    "query": "CREATE INDEX IF NOT EXISTS idx_docs_category ON \"YOUR_WORKSPACE\".\"documents\" USING deeplake_index (category) WITH (index_type = '\''exact_text'\'')"
  }'

When to use each

Index type Column type Use case
Vector (cosine) FLOAT4[] "Find similar items": semantic search with single-vector embeddings
Vector (MaxSim) FLOAT4[][] "Find similar items": late-interaction retrieval (ColBERT-style) with multi-vector embeddings
BM25 TEXT "Find exact keywords": error codes, function names, IDs
Exact text TEXT "Filter by category": fast equality checks

Multiple indexes on one table

You can have all three on the same table:

TICKETS_TABLE = "tickets"

# Vector index for semantic search
client.query(f"""
    CREATE INDEX idx_vec ON "{WORKSPACE}"."{TICKETS_TABLE}"
    USING deeplake_index (embedding DESC)
""")

# BM25 index for keyword search
client.query(f"""
    CREATE INDEX idx_bm25 ON "{WORKSPACE}"."{TICKETS_TABLE}"
    USING deeplake_index (description)
    WITH (index_type = 'bm25')
""")

# Exact text index for filtering
client.query(f"""
    CREATE INDEX idx_status ON "{WORKSPACE}"."{TICKETS_TABLE}"
    USING deeplake_index (status)
    WITH (index_type = 'exact_text')
""")
# Vector index for semantic search
curl -s -X POST "$API_URL/workspaces/$DEEPLAKE_WORKSPACE/tables/query" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $DEEPLAKE_API_KEY" \
  -H "X-Activeloop-Org-Id: $DEEPLAKE_ORG_ID" \
  -d '{
    "query": "CREATE INDEX idx_vec ON \"YOUR_WORKSPACE\".\"tickets\" USING deeplake_index (embedding DESC)"
  }'

# BM25 index for keyword search
curl -s -X POST "$API_URL/workspaces/$DEEPLAKE_WORKSPACE/tables/query" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $DEEPLAKE_API_KEY" \
  -H "X-Activeloop-Org-Id: $DEEPLAKE_ORG_ID" \
  -d '{
    "query": "CREATE INDEX idx_bm25 ON \"YOUR_WORKSPACE\".\"tickets\" USING deeplake_index (description) WITH (index_type = '\''bm25'\'')"
  }'

# Exact text index for filtering
curl -s -X POST "$API_URL/workspaces/$DEEPLAKE_WORKSPACE/tables/query" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $DEEPLAKE_API_KEY" \
  -H "X-Activeloop-Org-Id: $DEEPLAKE_ORG_ID" \
  -d '{
    "query": "CREATE INDEX idx_status ON \"YOUR_WORKSPACE\".\"tickets\" USING deeplake_index (status) WITH (index_type = '\''exact_text'\'')"
  }'

This combination enables hybrid search. See Search.

Notes

  • Index creation is a one-time cost. It pays back on every query.
  • DESC on vector indexes means higher scores are better matches.
  • Indexes are built asynchronously for large tables. The query returns immediately.

Next steps

  • Search: use your indexes with vector, BM25, hybrid, and multi-vector search
  • Semantic Search: end-to-end semantic search example
  • Image Search: visual similarity search with embeddings
  • Hybrid RAG: combine vector + BM25 indexes for RAG