Skip to content

Data Types

Deeplake supports scalar types, media types, annotation types, and embeddings. Types are used in table schemas and are automatically inferred during ingestion.

Type reference

Schema Type Python Type JS/TS Type Postgres Type Example
TEXT str string text "hello"
INT32 int number integer 42
INT64 int number bigint 9999999999
FLOAT32 float number real 3.14
FLOAT64 float number double precision 3.14159
BOOL bool boolean boolean True
BINARY bytes Buffer / Uint8Array bytea b"\x00\x01"
IMAGE bytes Buffer / Uint8Array IMAGE (bytea) Image binary
VIDEO bytes Buffer / Uint8Array bytea Video binary
EMBEDDING list[float] number[] float4[] [0.1, 0.2, 0.3]
SEGMENT_MASK bytes Buffer / Uint8Array SEGMENT_MASK (bytea) Segmentation mask
BINARY_MASK bytes Buffer / Uint8Array BINARY_MASK (bytea) Binary mask
BOUNDING_BOX list[float] number[] BOUNDING_BOX (float4[]) [x, y, w, h]
CLASS_LABEL int number CLASS_LABEL (int4) Label index
POLYGON bytes Buffer / Uint8Array DEEPLAKE_POLYGON (bytea) Polygon coordinates
POINT list[float] number[] DEEPLAKE_POINT (float4[]) [1.0, 2.0]
MESH bytes Buffer / Uint8Array MESH (bytea) 3D mesh (PLY, STL)
MEDICAL bytes Buffer / Uint8Array MEDICAL (bytea) Medical imaging (DICOM)
FILE str (path) string (path) N/A "/path/to/file.mp4"

Schema inference

When you call client.ingest() without an explicit schema, types are inferred from the data:

Python / JS value Inferred type
bool / boolean BOOL
int / integer number INT64
float / decimal number FLOAT64
str / string TEXT
bytes / Buffer BINARY
list[float] / number[] EMBEDDING (dimension auto-detected)

To override inference, pass schema={"column": "TYPE"}:

client.ingest("my_table", {
    "image_data": [raw_bytes_1, raw_bytes_2],
}, schema={"image_data": "IMAGE"})

Embeddings

Embedding columns store vectors for similarity search via the <#> operator.

Format Postgres type Use case
Single vector FLOAT4[] One embedding per row (text, image encoders)
Multi-vector FLOAT4[][] Bag of embeddings per row (ColBERT-style late interaction)

Dimension is detected automatically from the first row. All rows in a column must have the same dimension.

Embedding columns require a vector index for search to work.

Domain types

IMAGE, SEGMENT_MASK, BINARY_MASK, BOUNDING_BOX, CLASS_LABEL, POLYGON, POINT, MESH, and MEDICAL are PostgreSQL domain types defined by pg_deeplake. They store data in their base type (bytea, float4[], int4) but carry semantic meaning for visualization and type-aware processing.

For example, a column typed IMAGE is stored as bytea but Deeplake knows to render it as an image in the visualizer and generate thumbnails during ingestion.

FILE type

FILE is not a storage type. It is a processing directive. Columns marked FILE in the schema are treated as file paths during ingestion. Each file is processed (video chunked, PDF split by page, text split into overlapping chunks) and the results are stored in generated columns.

client.ingest("docs", {
    "path": ["report.pdf", "notes.txt"]
}, schema={"path": "FILE"})

See Ingestion for chunking details per file type.

Next steps

  • Tables: CRUD operations
  • Indexes: create vector, BM25, and text indexes
  • Search: query with the <#> operator
  • Data Formats: built-in and custom format classes