Skip to content

Indexes

Indexes make search fast. You create them once on a column. Queries use them automatically.

Deep Lake supports three index types, all created via USING deeplake_index.

Prerequisite: USING deeplake

Tables must be created with USING deeplake for indexes to work. Without the USING deeplake engine clause on the table, index creation will fail.

Setup

import requests

API_URL = "https://api.deeplake.ai"
TOKEN = "YOUR_TOKEN"
WORKSPACE = "YOUR_WORKSPACE"
TABLE = "documents"

headers = {
    "Authorization": f"Bearer {TOKEN}",
    "Content-Type": "application/json",
}

def query(sql):
    res = requests.post(
        f"{API_URL}/workspaces/{WORKSPACE}/tables/query",
        headers=headers,
        json={"query": sql},
    )
    return res.json()

Vector index

For similarity search on embedding columns (FLOAT4[] or FLOAT4[][]).

query(f"""
    CREATE INDEX IF NOT EXISTS idx_docs_vec
    ON "{WORKSPACE}"."{TABLE}" USING deeplake_index (embedding DESC)
""")

Enables the <#> operator for vector similarity:

SELECT *, embedding <#> ARRAY[0.1, 0.2, ...]::float4[] AS score
FROM "my_workspace"."documents" ORDER BY score DESC LIMIT 10

BM25 index

For keyword search on text columns.

query(f"""
    CREATE INDEX IF NOT EXISTS idx_docs_bm25
    ON "{WORKSPACE}"."{TABLE}" USING deeplake_index (content)
    WITH (index_type = 'bm25')
""")

Enables the <#> operator for text ranking:

SELECT *, content <#> 'authentication error' AS score
FROM "my_workspace"."documents" ORDER BY score ASC LIMIT 10

Exact text index

For fast exact string filtering.

query(f"""
    CREATE INDEX IF NOT EXISTS idx_docs_category
    ON "{WORKSPACE}"."{TABLE}" USING deeplake_index (category)
    WITH (index_type = 'exact_text')
""")

When to use each

Index type Column type Use case
Vector FLOAT4[], FLOAT4[][] "Find similar items" — semantic search
BM25 TEXT "Find exact keywords" — error codes, function names, IDs
Exact text TEXT "Filter by category" — fast equality checks

Multiple indexes on one table

You can have all three on the same table:

TICKETS_TABLE = "tickets"

# Vector index for semantic search
query(f"""
    CREATE INDEX idx_vec ON "{WORKSPACE}"."{TICKETS_TABLE}"
    USING deeplake_index (embedding DESC)
""")

# BM25 index for keyword search
query(f"""
    CREATE INDEX idx_bm25 ON "{WORKSPACE}"."{TICKETS_TABLE}"
    USING deeplake_index (description)
    WITH (index_type = 'bm25')
""")

# Exact text index for filtering
query(f"""
    CREATE INDEX idx_status ON "{WORKSPACE}"."{TICKETS_TABLE}"
    USING deeplake_index (status)
    WITH (index_type = 'exact_text')
""")

This combination enables hybrid search — see Search.

Notes

  • Index creation is a one-time cost. It pays back on every query.
  • DESC on vector indexes means higher scores are better matches.
  • Indexes are built asynchronously for large tables. The query returns immediately.