Data Types¶

Deeplake supports scalar types, media types, annotation types, and embeddings. Types are used in table schemas and are automatically inferred during ingestion.

Type reference¶

Schema Type	Python Type	JS/TS Type	Postgres Type	Example
`TEXT`	`str`	`string`	`text`	`"hello"`
`INT32`	`int`	`number`	`integer`	`42`
`INT64`	`int`	`number`	`bigint`	`9999999999`
`FLOAT32`	`float`	`number`	`real`	`3.14`
`FLOAT64`	`float`	`number`	`double precision`	`3.14159`
`BOOL`	`bool`	`boolean`	`boolean`	`True`
`BINARY`	`bytes`	`Buffer` / `Uint8Array`	`bytea`	`b"\x00\x01"`
`IMAGE`	`bytes`	`Buffer` / `Uint8Array`	`IMAGE (bytea)`	Image binary
`VIDEO`	`bytes`	`Buffer` / `Uint8Array`	`bytea`	Video binary
`EMBEDDING`	`list[float]`	`number[]`	`float4[]`	`[0.1, 0.2, 0.3]`
`SEGMENT_MASK`	`bytes`	`Buffer` / `Uint8Array`	`SEGMENT_MASK (bytea)`	Segmentation mask
`BINARY_MASK`	`bytes`	`Buffer` / `Uint8Array`	`BINARY_MASK (bytea)`	Binary mask
`BOUNDING_BOX`	`list[float]`	`number[]`	`BOUNDING_BOX (float4[])`	`[x, y, w, h]`
`CLASS_LABEL`	`int`	`number`	`CLASS_LABEL (int4)`	Label index
`POLYGON`	`bytes`	`Buffer` / `Uint8Array`	`DEEPLAKE_POLYGON (bytea)`	Polygon coordinates
`POINT`	`list[float]`	`number[]`	`DEEPLAKE_POINT (float4[])`	`[1.0, 2.0]`
`MESH`	`bytes`	`Buffer` / `Uint8Array`	`MESH (bytea)`	3D mesh (PLY, STL)
`MEDICAL`	`bytes`	`Buffer` / `Uint8Array`	`MEDICAL (bytea)`	Medical imaging (DICOM)
`FILE`	`str` (path)	`string` (path)	N/A	`"/path/to/file.mp4"`

Schema inference¶

When you call client.ingest() without an explicit schema, types are inferred from the data:

Python / JS value	Inferred type
`bool` / `boolean`	`BOOL`
`int` / integer `number`	`INT64`
`float` / decimal `number`	`FLOAT64`
`str` / `string`	`TEXT`
`bytes` / `Buffer`	`BINARY`
`list[float]` / `number[]`	`EMBEDDING` (dimension auto-detected)

To override inference, pass schema={"column": "TYPE"}:

client.ingest("my_table", {
    "image_data": [raw_bytes_1, raw_bytes_2],
}, schema={"image_data": "IMAGE"})

Embeddings¶

Embedding columns store vectors for similarity search via the <#> operator.

Format	Postgres type	Use case
Single vector	`FLOAT4[]`	One embedding per row (text, image encoders)
Multi-vector	`FLOAT4[][]`	Bag of embeddings per row (ColBERT-style late interaction)

Dimension is detected automatically from the first row. All rows in a column must have the same dimension.

Embedding columns require a vector index for search to work.

Domain types¶

IMAGE, SEGMENT_MASK, BINARY_MASK, BOUNDING_BOX, CLASS_LABEL, POLYGON, POINT, MESH, and MEDICAL are PostgreSQL domain types defined by pg_deeplake. They store data in their base type (bytea, float4[], int4) but carry semantic meaning for visualization and type-aware processing.

For example, a column typed IMAGE is stored as bytea but Deeplake knows to render it as an image in the visualizer and generate thumbnails during ingestion.

FILE type¶

FILE is not a storage type. It is a processing directive. Columns marked FILE in the schema are treated as file paths during ingestion. Each file is processed (video chunked, PDF split by page, text split into overlapping chunks) and the results are stored in generated columns.

client.ingest("docs", {
    "path": ["report.pdf", "notes.txt"]
}, schema={"path": "FILE"})

See Ingestion for chunking details per file type.

Next steps¶

Tables: CRUD operations
Indexes: create vector, BM25, and text indexes
Search: query with the <#> operator
Data Formats: built-in and custom format classes