Indexes¶

Indexes make search fast. You create them once on a column. Queries use them automatically.

Deeplake supports three index types, all created via USING deeplake_index.

Prerequisite: USING deeplake

Tables must be created with USING deeplake for indexes to work. Without the USING deeplake engine clause on the table, index creation will fail.

Setup¶

Set DEEPLAKE_API_KEY and DEEPLAKE_WORKSPACE as environment variables (see Quickstart).

Set credentials first

export DEEPLAKE_API_KEY="your-token-here"
export DEEPLAKE_WORKSPACE="your-workspace"  # optional, defaults to "default"

Python SDKREST API

import os
from deeplake import Client

client = Client()
WORKSPACE = os.environ["DEEPLAKE_WORKSPACE"]
TABLE = "documents"

API_URL="https://api.deeplake.ai"
TABLE="documents"
export DEEPLAKE_ORG_ID="your-org-id"

Vector index¶

For similarity search on embedding columns. Deeplake supports two embedding formats:

Column type	Algorithm	How it works
`FLOAT4[]`	Cosine similarity	Single vector per row. Computes cosine distance between the query vector and each row's embedding. Best for text embeddings, image embeddings, or any model that produces one vector per item.
`FLOAT4[][]`	MaxSim	Bag of vectors per row (e.g. one vector per token/patch). For each query vector, finds the best-matching vector in the row, then sums those scores. Used by ColBERT-style late-interaction models for higher-quality retrieval.

Python SDKREST API

client.query(f"""
    CREATE INDEX IF NOT EXISTS idx_docs_vec
    ON "{WORKSPACE}"."{TABLE}" USING deeplake_index (embedding DESC)
""")

curl -s -X POST "$API_URL/workspaces/$DEEPLAKE_WORKSPACE/tables/query" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $DEEPLAKE_API_KEY" \
  -H "X-Activeloop-Org-Id: $DEEPLAKE_ORG_ID" \
  -d '{
    "query": "CREATE INDEX IF NOT EXISTS idx_docs_vec ON \"YOUR_WORKSPACE\".\"documents\" USING deeplake_index (embedding DESC)"
  }'

Enables the <#> operator for vector similarity:

SELECT *, embedding <#> ARRAY[0.1, 0.2, ...]::float4[] AS score
FROM "my_workspace"."documents" ORDER BY score DESC LIMIT 10

BM25 index¶

For keyword search on text columns.

Python SDKREST API

client.query(f"""
    CREATE INDEX IF NOT EXISTS idx_docs_bm25
    ON "{WORKSPACE}"."{TABLE}" USING deeplake_index (content)
    WITH (index_type = 'bm25')
""")

curl -s -X POST "$API_URL/workspaces/$DEEPLAKE_WORKSPACE/tables/query" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $DEEPLAKE_API_KEY" \
  -H "X-Activeloop-Org-Id: $DEEPLAKE_ORG_ID" \
  -d '{
    "query": "CREATE INDEX IF NOT EXISTS idx_docs_bm25 ON \"YOUR_WORKSPACE\".\"documents\" USING deeplake_index (content) WITH (index_type = '\''bm25'\'')"
  }'

Enables the <#> operator for text ranking:

SELECT *, content <#> 'authentication error' AS score
FROM "my_workspace"."documents" ORDER BY score DESC LIMIT 10

Exact text index¶

For fast exact string filtering.

Python SDKREST API

client.query(f"""
    CREATE INDEX IF NOT EXISTS idx_docs_category
    ON "{WORKSPACE}"."{TABLE}" USING deeplake_index (category)
    WITH (index_type = 'exact_text')
""")

curl -s -X POST "$API_URL/workspaces/$DEEPLAKE_WORKSPACE/tables/query" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $DEEPLAKE_API_KEY" \
  -H "X-Activeloop-Org-Id: $DEEPLAKE_ORG_ID" \
  -d '{
    "query": "CREATE INDEX IF NOT EXISTS idx_docs_category ON \"YOUR_WORKSPACE\".\"documents\" USING deeplake_index (category) WITH (index_type = '\''exact_text'\'')"
  }'

When to use each¶

Index type	Column type	Use case
Vector (cosine)	`FLOAT4[]`	"Find similar items": semantic search with single-vector embeddings
Vector (MaxSim)	`FLOAT4[][]`	"Find similar items": late-interaction retrieval (ColBERT-style) with multi-vector embeddings
BM25	`TEXT`	"Find exact keywords": error codes, function names, IDs
Exact text	`TEXT`	"Filter by category": fast equality checks

Multiple indexes on one table¶

You can have all three on the same table:

Python SDKREST API

TICKETS_TABLE = "tickets"

# Vector index for semantic search
client.query(f"""
    CREATE INDEX idx_vec ON "{WORKSPACE}"."{TICKETS_TABLE}"
    USING deeplake_index (embedding DESC)
""")

# BM25 index for keyword search
client.query(f"""
    CREATE INDEX idx_bm25 ON "{WORKSPACE}"."{TICKETS_TABLE}"
    USING deeplake_index (description)
    WITH (index_type = 'bm25')
""")

# Exact text index for filtering
client.query(f"""
    CREATE INDEX idx_status ON "{WORKSPACE}"."{TICKETS_TABLE}"
    USING deeplake_index (status)
    WITH (index_type = 'exact_text')
""")

# Vector index for semantic search
curl -s -X POST "$API_URL/workspaces/$DEEPLAKE_WORKSPACE/tables/query" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $DEEPLAKE_API_KEY" \
  -H "X-Activeloop-Org-Id: $DEEPLAKE_ORG_ID" \
  -d '{
    "query": "CREATE INDEX idx_vec ON \"YOUR_WORKSPACE\".\"tickets\" USING deeplake_index (embedding DESC)"
  }'

# BM25 index for keyword search
curl -s -X POST "$API_URL/workspaces/$DEEPLAKE_WORKSPACE/tables/query" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $DEEPLAKE_API_KEY" \
  -H "X-Activeloop-Org-Id: $DEEPLAKE_ORG_ID" \
  -d '{
    "query": "CREATE INDEX idx_bm25 ON \"YOUR_WORKSPACE\".\"tickets\" USING deeplake_index (description) WITH (index_type = '\''bm25'\'')"
  }'

# Exact text index for filtering
curl -s -X POST "$API_URL/workspaces/$DEEPLAKE_WORKSPACE/tables/query" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $DEEPLAKE_API_KEY" \
  -H "X-Activeloop-Org-Id: $DEEPLAKE_ORG_ID" \
  -d '{
    "query": "CREATE INDEX idx_status ON \"YOUR_WORKSPACE\".\"tickets\" USING deeplake_index (status) WITH (index_type = '\''exact_text'\'')"
  }'

This combination enables hybrid search. See Search.

Notes¶

Index creation is a one-time cost. It pays back on every query.
DESC on vector indexes means higher scores are better matches.
Indexes are built asynchronously for large tables. The query returns immediately.

Next steps¶

Search: use your indexes with vector, BM25, hybrid, and multi-vector search
Semantic Search: end-to-end semantic search example
Image Search: visual similarity search with embeddings
Hybrid RAG: combine vector + BM25 indexes for RAG