SQL Operators and Types¶

Deep Lake extends standard Postgres with operators and types for similarity search.

The `<#>` operator¶

The primary search operator. Its behavior depends on the column type:

Left side	Right side	Behavior
`FLOAT4[]` column	`FLOAT4[]` literal	Vector similarity (cosine-like)
`FLOAT4[][]` column	`FLOAT4[][]` literal	Multi-vector MaxSim
`TEXT` column	`TEXT` literal	BM25 keyword score
`(FLOAT4[], TEXT)` tuple	`deeplake_hybrid_record(...)`	Hybrid BM25 + vector

Vector similarity¶

SELECT *, embedding <#> ARRAY[0.1, 0.2, 0.3]::float4[] AS score
FROM "my_workspace"."my_table"
ORDER BY score DESC
LIMIT 10

Higher score = better match. Sort DESC.

Multi-vector (MaxSim)¶

SELECT *, embedding <#> ARRAY[ARRAY[0.1, 0.2], ARRAY[0.3, 0.4]]::float4[][] AS score
FROM "my_workspace"."my_table"
ORDER BY score DESC
LIMIT 10

Each query token is matched to the best-matching token in the item.

BM25¶

SELECT *, content <#> 'search terms here' AS score
FROM "my_workspace"."my_table"
ORDER BY score ASC
LIMIT 10

Lower score = better match. Sort ASC.

`deeplake_hybrid_record`¶

Combines a vector and a text query into a single hybrid search.

deeplake_hybrid_record(
    ARRAY[0.1, 0.2, ...]::float4[],  -- vector query
    'search text',                     -- BM25 query
    0.5,                               -- vector weight
    0.5                                -- BM25 weight
)

Used with a tuple on the left side:

SELECT *,
    (embedding, content) <#> deeplake_hybrid_record(
        ARRAY[0.1, 0.2, ...]::float4[],
        'search text',
        0.5, 0.5
    ) AS score
FROM "my_workspace"."my_table"
ORDER BY score ASC
LIMIT 10

Weights must be between 0 and 1. They control the balance between semantic and keyword matching.

Tensor column types¶

Type	Usage	Example
`FLOAT4[]`	Single embedding vector	`ARRAY[0.1, 0.2, 0.3]::float4[]`
`FLOAT4[][]`	Multi-vector embedding (bag of tokens)	`ARRAY[ARRAY[0.1, 0.2], ARRAY[0.3, 0.4]]::float4[][]`

Formatting in Python¶

Single vector:

emb = [0.1, 0.2, 0.3]
literal = "ARRAY[" + ",".join(str(v) for v in emb) + "]::float4[]"

Multi-vector:

emb = [[0.1, 0.2], [0.3, 0.4]]
inner = ", ".join("ARRAY[" + ",".join(str(v) for v in row) + "]" for row in emb)
literal = f"ARRAY[{inner}]::float4[][]"

Index syntax¶

Create a vector index¶

CREATE INDEX idx_name ON "workspace"."table" USING deeplake_index (column DESC);

Create a BM25 index¶

CREATE INDEX idx_name ON "workspace"."table" USING deeplake_index (column) WITH (index_type = 'bm25');

Create an exact text index¶

CREATE INDEX idx_name ON "workspace"."table" USING deeplake_index (column) WITH (index_type = 'exact_text');

Note: the table must be created with USING deeplake for these indexes to work.

Table engine¶

Tables must use the Deep Lake storage engine and be schema-qualified with the workspace name:

CREATE TABLE "my_workspace"."my_table" (
    id SERIAL PRIMARY KEY,
    content TEXT,
    embedding FLOAT4[]
) USING deeplake;

The USING deeplake clause is required for indexes (vector, BM25, exact text) to work. The workspace name serves as the Postgres schema.

JSONB operators¶

Standard Postgres JSONB operators work in filters:

Operator	Meaning	Example
`->>'key'`	Get value as text	`metadata->>'source' = 'api'`
`?`	Key exists	`tags ? 'nature'`
`@>`	Contains	`metadata @> '{"version": 2}'`