Indexes¶
Indexes make search fast. You create them once on a column. Queries use them automatically.
Deep Lake supports three index types, all created via USING deeplake_index.
Prerequisite: USING deeplake
Tables must be created with USING deeplake for indexes to work. Without the USING deeplake engine clause on the table, index creation will fail.
Setup¶
import requests
API_URL = "https://api.deeplake.ai"
TOKEN = "YOUR_TOKEN"
WORKSPACE = "YOUR_WORKSPACE"
TABLE = "documents"
headers = {
"Authorization": f"Bearer {TOKEN}",
"Content-Type": "application/json",
}
def query(sql):
res = requests.post(
f"{API_URL}/workspaces/{WORKSPACE}/tables/query",
headers=headers,
json={"query": sql},
)
return res.json()
Vector index¶
For similarity search on embedding columns (FLOAT4[] or FLOAT4[][]).
query(f"""
CREATE INDEX IF NOT EXISTS idx_docs_vec
ON "{WORKSPACE}"."{TABLE}" USING deeplake_index (embedding DESC)
""")
Enables the <#> operator for vector similarity:
SELECT *, embedding <#> ARRAY[0.1, 0.2, ...]::float4[] AS score
FROM "my_workspace"."documents" ORDER BY score DESC LIMIT 10
BM25 index¶
For keyword search on text columns.
query(f"""
CREATE INDEX IF NOT EXISTS idx_docs_bm25
ON "{WORKSPACE}"."{TABLE}" USING deeplake_index (content)
WITH (index_type = 'bm25')
""")
Enables the <#> operator for text ranking:
SELECT *, content <#> 'authentication error' AS score
FROM "my_workspace"."documents" ORDER BY score ASC LIMIT 10
Exact text index¶
For fast exact string filtering.
query(f"""
CREATE INDEX IF NOT EXISTS idx_docs_category
ON "{WORKSPACE}"."{TABLE}" USING deeplake_index (category)
WITH (index_type = 'exact_text')
""")
When to use each¶
| Index type | Column type | Use case |
|---|---|---|
| Vector | FLOAT4[], FLOAT4[][] |
"Find similar items" — semantic search |
| BM25 | TEXT |
"Find exact keywords" — error codes, function names, IDs |
| Exact text | TEXT |
"Filter by category" — fast equality checks |
Multiple indexes on one table¶
You can have all three on the same table:
TICKETS_TABLE = "tickets"
# Vector index for semantic search
query(f"""
CREATE INDEX idx_vec ON "{WORKSPACE}"."{TICKETS_TABLE}"
USING deeplake_index (embedding DESC)
""")
# BM25 index for keyword search
query(f"""
CREATE INDEX idx_bm25 ON "{WORKSPACE}"."{TICKETS_TABLE}"
USING deeplake_index (description)
WITH (index_type = 'bm25')
""")
# Exact text index for filtering
query(f"""
CREATE INDEX idx_status ON "{WORKSPACE}"."{TICKETS_TABLE}"
USING deeplake_index (status)
WITH (index_type = 'exact_text')
""")
This combination enables hybrid search — see Search.
Notes¶
- Index creation is a one-time cost. It pays back on every query.
DESCon vector indexes means higher scores are better matches.- Indexes are built asynchronously for large tables. The query returns immediately.