Skip to content

SQL Operators and Types

Deeplake is built on PostgreSQL. All standard SQL works out of the box: SELECT, INSERT, JOIN, WHERE, GROUP BY, subqueries, CTEs, window functions, and everything else you'd expect from Postgres.

This page documents the Deeplake-specific extensions: custom operators, types, and index syntax for similarity search. For standard SQL, refer to the PostgreSQL documentation.


The <#> operator

The primary search operator. Behavior depends on the column type:

Left side Right side Behavior Sort
FLOAT4[] column FLOAT4[] literal Vector cosine similarity DESC
FLOAT4[][] column FLOAT4[][] literal Multi-vector MaxSim DESC
TEXT column TEXT literal BM25 keyword score DESC
(FLOAT4[], TEXT) tuple deeplake_hybrid_record(...) Hybrid BM25 + vector DESC

Examples

Higher score = better match. Sort DESC.

SELECT *, embedding <#> ARRAY[0.1, 0.2, 0.3]::float4[] AS score
FROM "my_workspace"."my_table"
ORDER BY score DESC
LIMIT 10

Each query token matched to best-matching item token.

SELECT *, embedding <#> ARRAY[ARRAY[0.1, 0.2], ARRAY[0.3, 0.4]]::float4[][] AS score
FROM "my_workspace"."my_table"
ORDER BY score DESC
LIMIT 10

Higher score = better match. Sort DESC.

SELECT *, content <#> 'search terms here' AS score
FROM "my_workspace"."my_table"
ORDER BY score DESC
LIMIT 10

Combines vector + BM25 with configurable weights.

SELECT *,
    (embedding, content) <#> deeplake_hybrid_record(
        ARRAY[0.1, 0.2, ...]::float4[],
        'search text',
        0.5, 0.5
    ) AS score
FROM "my_workspace"."my_table"
ORDER BY score DESC
LIMIT 10

Vector similarity search combined with a WHERE clause.

SELECT id, title, embedding <#> ARRAY[0.1, 0.2, 0.3]::float4[] AS score
FROM "my_workspace"."my_table"
WHERE category = 'science'
ORDER BY score DESC
LIMIT 10

deeplake_hybrid_record()

Combines a vector and a text query into a single hybrid search.

deeplake_hybrid_record(vector_query, text_query, vector_weight, text_weight)

Parameters

Parameter Type Description
vector_query FLOAT4[] The embedding vector to search with
text_query TEXT The BM25 text query string
vector_weight FLOAT Weight for vector similarity (0-1)
text_weight FLOAT Weight for BM25 text similarity (0-1)

Weight presets

Weights Best for
0.7, 0.3 Conceptual queries where meaning matters more
0.5, 0.5 Balanced default
0.3, 0.7 Precise queries where exact keywords matter

Examples

Best when meaning matters more than exact keywords.

SELECT id, content,
    (embedding, content) <#> deeplake_hybrid_record(
        ARRAY[0.1, 0.2, ...]::float4[],
        'how does photosynthesis work',
        0.7, 0.3
    ) AS score
FROM "my_workspace"."documents"
ORDER BY score DESC
LIMIT 10

Default choice when unsure about query characteristics.

SELECT id, content,
    (embedding, content) <#> deeplake_hybrid_record(
        ARRAY[0.1, 0.2, ...]::float4[],
        'machine learning transformers',
        0.5, 0.5
    ) AS score
FROM "my_workspace"."documents"
ORDER BY score DESC
LIMIT 10

Best when exact terminology matters (e.g. error codes, product names).

SELECT id, content,
    (embedding, content) <#> deeplake_hybrid_record(
        ARRAY[0.1, 0.2, ...]::float4[],
        'ERR_CONNECTION_REFUSED nginx 502',
        0.3, 0.7
    ) AS score
FROM "my_workspace"."documents"
ORDER BY score DESC
LIMIT 10

Tensor Column Types

Type Usage Literal example
FLOAT4[] Single embedding vector ARRAY[0.1, 0.2, 0.3]::float4[]
FLOAT4[][] Multi-vector embedding (bag of tokens) ARRAY[ARRAY[0.1, 0.2], ARRAY[0.3, 0.4]]::float4[][]

Embedding Literals in Python

emb = [0.1, 0.2, 0.3]
literal = "ARRAY[" + ",".join(str(v) for v in emb) + "]::float4[]"
emb = [[0.1, 0.2], [0.3, 0.4]]
inner = ", ".join("ARRAY[" + ",".join(str(v) for v in row) + "]" for row in emb)
literal = f"ARRAY[{inner}]::float4[][]"

Index Syntax

All indexes use USING deeplake_index. The table must be created with USING deeplake.

CREATE INDEX idx_name
ON "workspace"."table" USING deeplake_index (column DESC);
CREATE INDEX idx_name
ON "workspace"."table" USING deeplake_index (column)
WITH (index_type = 'bm25');
CREATE INDEX idx_name
ON "workspace"."table" USING deeplake_index (column)
WITH (index_type = 'exact_text');

Use IF NOT EXISTS to avoid errors when re-running setup scripts.

CREATE INDEX IF NOT EXISTS idx_vec
ON "workspace"."table" USING deeplake_index (embedding DESC);

CREATE INDEX IF NOT EXISTS idx_bm25
ON "workspace"."table" USING deeplake_index (content)
WITH (index_type = 'bm25');

Table Engine

Tables must use the Deeplake storage engine and be schema-qualified:

CREATE TABLE "my_workspace"."my_table" (
    id SERIAL PRIMARY KEY,
    content TEXT,
    embedding FLOAT4[]
) USING deeplake;

Important

The USING deeplake clause is required for vector, BM25, and exact text indexes to work. The workspace name serves as the Postgres schema.