Skip to content

Core Concepts

Deep Lake has three primitives. Everything else is built on top of them.

Files

Binary blobs. Images, video, audio, PDFs, point clouds — anything.

You upload a file via the REST API. You get back a UUID. That UUID is your stable reference. You store it in a table column. You use it to download the file later or render it in the frontend.

Files live in lake-scale object storage. They are cheap and durable.

Upload PNG ──→ REST API ──→ UUID: "a1b2c3d4-..."

When to use: any time you have binary content that doesn't belong in a SQL column.

Tables

Standard Postgres tables with an extension: tensor columns.

A table can have normal columns (TEXT, INT, JSONB, TIMESTAMPTZ) alongside tensor columns (FLOAT4[] for embeddings, FLOAT4[][] for multi-vector embeddings).

You create tables, insert rows, and query them via a single REST SQL endpoint. It is standard SQL.

CREATE TABLE ... USING deeplake → INSERT INTO → SELECT ... WHERE → done

Tables must be schema-qualified with your workspace name (e.g., "my_workspace"."my_table") and created with USING deeplake to enable vector, BM25, and exact text indexes.

When to use: always. Tables are your primary data structure.

Indexes

Acceleration structures that make search fast. You build them on tensor or text columns.

Three types:

Index type Built on Enables
Vector (deeplake_index) FLOAT4[] or FLOAT4[][] columns Similarity search with <#>
BM25 TEXT columns Keyword search with <#>
Exact text TEXT columns Fast exact string filtering

You create an index once. Queries use it automatically.

CREATE INDEX idx_vec ON "my_workspace"."my_table" USING deeplake_index (embedding DESC);
CREATE INDEX idx_bm25 ON "my_workspace"."my_table" USING deeplake_index (content) WITH (index_type = 'bm25');

Note: the table in the index must also be schema-qualified. The table itself must be created with USING deeplake for indexes to work.

When to use: on any column you search frequently.

How They Connect

The typical pattern:

  1. Upload files → get UUIDs
  2. Insert rows → store metadata, embeddings, and file UUIDs in a table
  3. Create indexes → on the columns you'll search
  4. Query → SQL with vector/text operators, filter with WHERE, join with file UUIDs
Files (bytes) ──UUID──→ Tables (structure + vectors) ──index──→ Search (retrieval)

The key insight: your training set is a query. You describe what you want in SQL. You get back exactly the rows you need. No ETL pipeline. No data export.

Search Modes

Deep Lake supports four search modes, all through the same <#> operator:

Vector search — matches meaning, not keywords. Good for conceptual queries like "scenes similar to this clip."

BM25 search — matches exact words. Good for identifiers, error codes, function names.

Hybrid search — combines both. Best default for most use cases. Reduces both "semantic drift" and "keyword brittleness."

Multi-vector search — uses a bag of embeddings per item instead of one vector. Catches fine details in long documents, large images, or video clips.

Details and syntax for each mode: Search fundamentals