Data Types¶
Deeplake supports scalar types, media types, annotation types, and embeddings. Types are used in table schemas and are automatically inferred during ingestion.
Type reference¶
| Schema Type | Python Type | JS/TS Type | Postgres Type | Example |
|---|---|---|---|---|
TEXT |
str |
string |
text |
"hello" |
INT32 |
int |
number |
integer |
42 |
INT64 |
int |
number |
bigint |
9999999999 |
FLOAT32 |
float |
number |
real |
3.14 |
FLOAT64 |
float |
number |
double precision |
3.14159 |
BOOL |
bool |
boolean |
boolean |
True |
BINARY |
bytes |
Buffer / Uint8Array |
bytea |
b"\x00\x01" |
IMAGE |
bytes |
Buffer / Uint8Array |
IMAGE (bytea) |
Image binary |
VIDEO |
bytes |
Buffer / Uint8Array |
bytea |
Video binary |
EMBEDDING |
list[float] |
number[] |
float4[] |
[0.1, 0.2, 0.3] |
SEGMENT_MASK |
bytes |
Buffer / Uint8Array |
SEGMENT_MASK (bytea) |
Segmentation mask |
BINARY_MASK |
bytes |
Buffer / Uint8Array |
BINARY_MASK (bytea) |
Binary mask |
BOUNDING_BOX |
list[float] |
number[] |
BOUNDING_BOX (float4[]) |
[x, y, w, h] |
CLASS_LABEL |
int |
number |
CLASS_LABEL (int4) |
Label index |
POLYGON |
bytes |
Buffer / Uint8Array |
DEEPLAKE_POLYGON (bytea) |
Polygon coordinates |
POINT |
list[float] |
number[] |
DEEPLAKE_POINT (float4[]) |
[1.0, 2.0] |
MESH |
bytes |
Buffer / Uint8Array |
MESH (bytea) |
3D mesh (PLY, STL) |
MEDICAL |
bytes |
Buffer / Uint8Array |
MEDICAL (bytea) |
Medical imaging (DICOM) |
FILE |
str (path) |
string (path) |
N/A | "/path/to/file.mp4" |
Schema inference¶
When you call client.ingest() without an explicit schema, types are inferred from the data:
| Python / JS value | Inferred type |
|---|---|
bool / boolean |
BOOL |
int / integer number |
INT64 |
float / decimal number |
FLOAT64 |
str / string |
TEXT |
bytes / Buffer |
BINARY |
list[float] / number[] |
EMBEDDING (dimension auto-detected) |
To override inference, pass schema={"column": "TYPE"}:
client.ingest("my_table", {
"image_data": [raw_bytes_1, raw_bytes_2],
}, schema={"image_data": "IMAGE"})
Embeddings¶
Embedding columns store vectors for similarity search via the <#> operator.
| Format | Postgres type | Use case |
|---|---|---|
| Single vector | FLOAT4[] |
One embedding per row (text, image encoders) |
| Multi-vector | FLOAT4[][] |
Bag of embeddings per row (ColBERT-style late interaction) |
Dimension is detected automatically from the first row. All rows in a column must have the same dimension.
Embedding columns require a vector index for search to work.
Domain types¶
IMAGE, SEGMENT_MASK, BINARY_MASK, BOUNDING_BOX, CLASS_LABEL, POLYGON, POINT, MESH, and MEDICAL are PostgreSQL domain types defined by pg_deeplake. They store data in their base type (bytea, float4[], int4) but carry semantic meaning for visualization and type-aware processing.
For example, a column typed IMAGE is stored as bytea but Deeplake knows to render it as an image in the visualizer and generate thumbnails during ingestion.
FILE type¶
FILE is not a storage type. It is a processing directive. Columns marked FILE in the schema are treated as file paths during ingestion. Each file is processed (video chunked, PDF split by page, text split into overlapping chunks) and the results are stored in generated columns.
See Ingestion for chunking details per file type.
Next steps¶
- Tables: CRUD operations
- Indexes: create vector, BM25, and text indexes
- Search: query with the
<#>operator - Data Formats: built-in and custom format classes