Skip to content

SQL Operators and Types

Deep Lake extends standard Postgres with operators and types for similarity search.

The <#> operator

The primary search operator. Its behavior depends on the column type:

Left side Right side Behavior
FLOAT4[] column FLOAT4[] literal Vector similarity (cosine-like)
FLOAT4[][] column FLOAT4[][] literal Multi-vector MaxSim
TEXT column TEXT literal BM25 keyword score
(FLOAT4[], TEXT) tuple deeplake_hybrid_record(...) Hybrid BM25 + vector

Vector similarity

SELECT *, embedding <#> ARRAY[0.1, 0.2, 0.3]::float4[] AS score
FROM "my_workspace"."my_table"
ORDER BY score DESC
LIMIT 10

Higher score = better match. Sort DESC.

Multi-vector (MaxSim)

SELECT *, embedding <#> ARRAY[ARRAY[0.1, 0.2], ARRAY[0.3, 0.4]]::float4[][] AS score
FROM "my_workspace"."my_table"
ORDER BY score DESC
LIMIT 10

Each query token is matched to the best-matching token in the item.

BM25

SELECT *, content <#> 'search terms here' AS score
FROM "my_workspace"."my_table"
ORDER BY score ASC
LIMIT 10

Lower score = better match. Sort ASC.

deeplake_hybrid_record

Combines a vector and a text query into a single hybrid search.

deeplake_hybrid_record(
    ARRAY[0.1, 0.2, ...]::float4[],  -- vector query
    'search text',                     -- BM25 query
    0.5,                               -- vector weight
    0.5                                -- BM25 weight
)

Used with a tuple on the left side:

SELECT *,
    (embedding, content) <#> deeplake_hybrid_record(
        ARRAY[0.1, 0.2, ...]::float4[],
        'search text',
        0.5, 0.5
    ) AS score
FROM "my_workspace"."my_table"
ORDER BY score ASC
LIMIT 10

Weights must be between 0 and 1. They control the balance between semantic and keyword matching.

Tensor column types

Type Usage Example
FLOAT4[] Single embedding vector ARRAY[0.1, 0.2, 0.3]::float4[]
FLOAT4[][] Multi-vector embedding (bag of tokens) ARRAY[ARRAY[0.1, 0.2], ARRAY[0.3, 0.4]]::float4[][]

Formatting in Python

Single vector:

emb = [0.1, 0.2, 0.3]
literal = "ARRAY[" + ",".join(str(v) for v in emb) + "]::float4[]"

Multi-vector:

emb = [[0.1, 0.2], [0.3, 0.4]]
inner = ", ".join("ARRAY[" + ",".join(str(v) for v in row) + "]" for row in emb)
literal = f"ARRAY[{inner}]::float4[][]"

Index syntax

Create a vector index

CREATE INDEX idx_name ON "workspace"."table" USING deeplake_index (column DESC);

Create a BM25 index

CREATE INDEX idx_name ON "workspace"."table" USING deeplake_index (column) WITH (index_type = 'bm25');

Create an exact text index

CREATE INDEX idx_name ON "workspace"."table" USING deeplake_index (column) WITH (index_type = 'exact_text');

Note: the table must be created with USING deeplake for these indexes to work.

Table engine

Tables must use the Deep Lake storage engine and be schema-qualified with the workspace name:

CREATE TABLE "my_workspace"."my_table" (
    id SERIAL PRIMARY KEY,
    content TEXT,
    embedding FLOAT4[]
) USING deeplake;

The USING deeplake clause is required for indexes (vector, BM25, exact text) to work. The workspace name serves as the Postgres schema.

JSONB operators

Standard Postgres JSONB operators work in filters:

Operator Meaning Example
->>'key' Get value as text metadata->>'source' = 'api'
? Key exists tags ? 'nature'
@> Contains metadata @> '{"version": 2}'