SQL Operators and Types¶

Deeplake is built on PostgreSQL. All standard SQL works out of the box: SELECT, INSERT, JOIN, WHERE, GROUP BY, subqueries, CTEs, window functions, and everything else you'd expect from Postgres.

This page documents the Deeplake-specific extensions: custom operators, types, and index syntax for similarity search. For standard SQL, refer to the PostgreSQL documentation.

The `<#>` operator¶

The primary search operator. Behavior depends on the column type:

Left side	Right side	Behavior	Sort
`FLOAT4[]` column	`FLOAT4[]` literal	Vector cosine similarity	`DESC`
`FLOAT4[][]` column	`FLOAT4[][]` literal	Multi-vector MaxSim	`DESC`
`TEXT` column	`TEXT` literal	BM25 keyword score	`DESC`
`(FLOAT4[], TEXT)` tuple	`deeplake_hybrid_record(...)`	Hybrid BM25 + vector	`DESC`

Examples¶

Vector similarityMulti-vector (MaxSim)BM25 keywordHybrid searchVector + WHERE filter

Higher score = better match. Sort DESC.

SELECT *, embedding <#> ARRAY[0.1, 0.2, 0.3]::float4[] AS score
FROM "my_workspace"."my_table"
ORDER BY score DESC
LIMIT 10

Each query token matched to best-matching item token.

SELECT *, embedding <#> ARRAY[ARRAY[0.1, 0.2], ARRAY[0.3, 0.4]]::float4[][] AS score
FROM "my_workspace"."my_table"
ORDER BY score DESC
LIMIT 10

Higher score = better match. Sort DESC.

SELECT *, content <#> 'search terms here' AS score
FROM "my_workspace"."my_table"
ORDER BY score DESC
LIMIT 10

Combines vector + BM25 with configurable weights.

SELECT *,
    (embedding, content) <#> deeplake_hybrid_record(
        ARRAY[0.1, 0.2, ...]::float4[],
        'search text',
        0.5, 0.5
    ) AS score
FROM "my_workspace"."my_table"
ORDER BY score DESC
LIMIT 10

Vector similarity search combined with a WHERE clause.

SELECT id, title, embedding <#> ARRAY[0.1, 0.2, 0.3]::float4[] AS score
FROM "my_workspace"."my_table"
WHERE category = 'science'
ORDER BY score DESC
LIMIT 10

deeplake_hybrid_record()¶

Combines a vector and a text query into a single hybrid search.

deeplake_hybrid_record(vector_query, text_query, vector_weight, text_weight)

Parameters¶

Parameter	Type	Description
`vector_query`	`FLOAT4[]`	The embedding vector to search with
`text_query`	`TEXT`	The BM25 text query string
`vector_weight`	`FLOAT`	Weight for vector similarity (0-1)
`text_weight`	`FLOAT`	Weight for BM25 text similarity (0-1)

Weight presets¶

Weights	Best for
`0.7, 0.3`	Conceptual queries where meaning matters more
`0.5, 0.5`	Balanced default
`0.3, 0.7`	Precise queries where exact keywords matter

Examples¶

Semantic-heavy (0.7 / 0.3)Balanced (0.5 / 0.5)Keyword-heavy (0.3 / 0.7)

Best when meaning matters more than exact keywords.

SELECT id, content,
    (embedding, content) <#> deeplake_hybrid_record(
        ARRAY[0.1, 0.2, ...]::float4[],
        'how does photosynthesis work',
        0.7, 0.3
    ) AS score
FROM "my_workspace"."documents"
ORDER BY score DESC
LIMIT 10

Default choice when unsure about query characteristics.

SELECT id, content,
    (embedding, content) <#> deeplake_hybrid_record(
        ARRAY[0.1, 0.2, ...]::float4[],
        'machine learning transformers',
        0.5, 0.5
    ) AS score
FROM "my_workspace"."documents"
ORDER BY score DESC
LIMIT 10

Best when exact terminology matters (e.g. error codes, product names).

SELECT id, content,
    (embedding, content) <#> deeplake_hybrid_record(
        ARRAY[0.1, 0.2, ...]::float4[],
        'ERR_CONNECTION_REFUSED nginx 502',
        0.3, 0.7
    ) AS score
FROM "my_workspace"."documents"
ORDER BY score DESC
LIMIT 10

Tensor Column Types¶

Type	Usage	Literal example
`FLOAT4[]`	Single embedding vector	`ARRAY[0.1, 0.2, 0.3]::float4[]`
`FLOAT4[][]`	Multi-vector embedding (bag of tokens)	`ARRAY[ARRAY[0.1, 0.2], ARRAY[0.3, 0.4]]::float4[][]`

Embedding Literals in Python¶

Single vectorMulti-vector

emb = [0.1, 0.2, 0.3]
literal = "ARRAY[" + ",".join(str(v) for v in emb) + "]::float4[]"

emb = [[0.1, 0.2], [0.3, 0.4]]
inner = ", ".join("ARRAY[" + ",".join(str(v) for v in row) + "]" for row in emb)
literal = f"ARRAY[{inner}]::float4[][]"

Index Syntax¶

All indexes use USING deeplake_index. The table must be created with USING deeplake.

Vector indexBM25 indexExact text indexConditional index

CREATE INDEX idx_name
ON "workspace"."table" USING deeplake_index (column DESC);

CREATE INDEX idx_name
ON "workspace"."table" USING deeplake_index (column)
WITH (index_type = 'bm25');

CREATE INDEX idx_name
ON "workspace"."table" USING deeplake_index (column)
WITH (index_type = 'exact_text');

Use IF NOT EXISTS to avoid errors when re-running setup scripts.

CREATE INDEX IF NOT EXISTS idx_vec
ON "workspace"."table" USING deeplake_index (embedding DESC);

CREATE INDEX IF NOT EXISTS idx_bm25
ON "workspace"."table" USING deeplake_index (content)
WITH (index_type = 'bm25');

Table Engine¶

Tables must use the Deeplake storage engine and be schema-qualified:

CREATE TABLE "my_workspace"."my_table" (
    id SERIAL PRIMARY KEY,
    content TEXT,
    embedding FLOAT4[]
) USING deeplake;

Important

The USING deeplake clause is required for vector, BM25, and exact text indexes to work. The workspace name serves as the Postgres schema.

SQL Operators and Types¶

The <#> operator¶

Examples¶

deeplake_hybrid_record()¶

Parameters¶

Weight presets¶

Examples¶

Tensor Column Types¶

Embedding Literals in Python¶

Index Syntax¶

Table Engine¶

The `<#>` operator¶