Types¶
Deep Lake provides a comprehensive type system designed for efficient data storage and retrieval. The type system includes basic numeric types as well as specialized types optimized for common data formats like images, embeddings, and text.
Each type can be specified either using the full type class or a string shorthand:
# Using type class
ds.add_column("col1", deeplake.types.Float32())
# Using string shorthand
ds.add_column("col2", "float32")
Types determine:¶
- How data is stored and compressed
- What operations are available
- How the data can be queried and indexed
- Integration with external libraries and frameworks
Numeric Types¶
All basic numeric types:
import deeplake
# Integers
ds.add_column("int8", deeplake.types.Int8()) # -128 to 127
ds.add_column("int16", deeplake.types.Int16()) # -32,768 to 32,767
ds.add_column("int32", deeplake.types.Int32()) # -2^31 to 2^31-1
ds.add_column("int64", deeplake.types.Int64()) # -2^63 to 2^63-1
# Unsigned Integers
ds.add_column("uint8", deeplake.types.UInt8()) # 0 to 255
ds.add_column("uint16", deeplake.types.UInt16()) # 0 to 65,535
ds.add_column("uint32", deeplake.types.UInt32()) # 0 to 2^32-1
ds.add_column("uint64", deeplake.types.UInt64()) # 0 to 2^64-1
# Floating Point
ds.add_column("float32", deeplake.types.Float32())
ds.add_column("float64", deeplake.types.Float64())
deeplake.types.Image
¶
An image of a given format. The value returned will be a multidimensional array of values rather than the raw image bytes.
Available formats:
- png (default)
- apng
- jpg / jpeg
- tiff / tif
- jpeg2000 / jp2
- bmp
- nii
- nii.gz
- dcm
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dtype
|
DataType | str
|
The data type of the array elements to return |
'uint8'
|
sample_compression
|
str
|
The on-disk compression/format of the image |
'png'
|
Examples:
# Basic image storage
ds.add_column("images", deeplake.types.Image())
# JPEG compression
ds.add_column("images", deeplake.types.Image(
sample_compression="jpeg"
))
# With specific dtype
ds.add_column("images", deeplake.types.Image(
dtype="uint8" # 8-bit RGB
))
deeplake.types.Embedding
¶
Embedding(
size: int | None = None,
dtype: DataType | str = "float32",
quantization: QuantizationType | None = None,
) -> Type
Creates a single-dimensional embedding of a given length.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
size
|
int | None
|
int | None The size of the embedding |
None
|
dtype
|
DataType | str
|
DataType | str The datatype of the embedding. Defaults to float32 |
'float32'
|
quantization
|
QuantizationType | None
|
QuantizationType | None
How to compress the embeddings in the index. Default uses no compression,
but can be set to :class: |
None
|
Returns:
Name | Type | Description |
---|---|---|
Type |
Type
|
A new embedding data type. |
See Also
:func:deeplake.types.Array
for a multidimensional array.
Examples:
Create embedding columns:
# Basic embeddings
ds.add_column("embeddings", deeplake.types.Embedding(768))
# With binary quantization for faster search
ds.add_column("embeddings", deeplake.types.Embedding(
size=768,
quantization=deeplake.types.QuantizationType.Binary
))
# Custom dtype
ds.add_column("embeddings", deeplake.types.Embedding(
size=768,
dtype="float32"
))
deeplake.types.Text
¶
Creates a text data type of arbitrary length.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
index_type
|
str | TextIndexType | None
|
str | TextIndexType | None How to index the data in the column for faster searching. Options are:
Default is |
None
|
Returns:
Name | Type | Description |
---|---|---|
Type |
Type
|
A new text data type. |
Examples:
Create text columns with different configurations:
# Basic text
ds.add_column("text", deeplake.types.Text())
# Text with BM25 index for semantic search
ds.add_column("text2", deeplake.types.Text(
index_type=deeplake.types.BM25
))
# Text with inverted index for keyword search
ds.add_column("text3", deeplake.types.Text(
index_type=deeplake.types.Inverted
))
deeplake.types.Dict
¶
Creates a type that supports storing arbitrary key/value pairs in each row.
Returns:
Name | Type | Description |
---|---|---|
Type |
Type
|
A new dictionary data type. |
See Also
:func:deeplake.types.Struct
for a type that supports defining allowed keys.
Examples:
Create and use a dictionary column:
# Store arbitrary key/value pairs
ds.add_column("metadata", deeplake.types.Dict())
# Add data
ds.append([{
"metadata": {
"timestamp": "2024-01-01",
"source": "camera_1",
"settings": {"exposure": 1.5}
}
}])
deeplake.types.Array
¶
Creates a generic array of data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dtype
|
DataType | str
|
DataType | str The datatype of values in the array |
required |
dimensions
|
int | None
|
int | None
The number of dimensions/axes in the array. Unlike specifying |
required |
shape
|
list[int] | None
|
list[int] | None Constrain the size of each dimension in the array |
required |
Returns:
Name | Type | Description |
---|---|---|
DataType |
DataType
|
A new array data type with the specified parameters. |
Examples:
Create a three-dimensional array, where each dimension can have any number of elements:
Create a three-dimensional array, where each dimension has a known size:
# Fixed-size array
ds.add_column("features", deeplake.types.Array(
"float32",
shape=[512] # Enforces size
))
# Variable-size array
ds.add_column("sequences", deeplake.types.Array(
"int32",
dimensions=1 # Allows any size
))
deeplake.types.BinaryMask
¶
In binary mask, pixel value is a boolean for whether there is/is-not an object of a class present.
NOTE: Since binary masks often contain large amounts of data, it is recommended to compress them using lz4.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sample_compression
|
str | None
|
How to compress each row's value. Possible values: lz4, null (default: null) |
None
|
chunk_compression
|
str | None
|
How to compress all the values stored in a single file. Possible values: lz4, null (default: null) |
None
|
Examples:
# Basic binary mask
ds.add_column("masks", deeplake.types.BinaryMask())
# With compression
ds.add_column("masks", deeplake.types.BinaryMask(
sample_compression="lz4"
))
deeplake.types.SegmentMask
¶
SegmentMask(
dtype: DataType | str = "uint8",
sample_compression: str | None = None,
chunk_compression: str | None = None,
) -> Type
Segmentation masks are 2D representations of class labels where a numerical class value is encoded in an array of same shape as the image.
NOTE: Since segmentation masks often contain large amounts of data, it is recommended to compress them using lz4.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sample_compression
|
str | None
|
How to compress each row's value. Possible values: lz4, null (default: null) |
None
|
chunk_compression
|
str | None
|
How to compress all the values stored in a single file. Possible values: lz4, null (default: null) |
None
|
Examples:
# Basic segmentation mask
ds.add_column("segmentation", deeplake.types.SegmentMask())
# With compression
ds.add_column("segmentation", deeplake.types.SegmentMask(
dtype="uint8",
sample_compression="lz4"
))
deeplake.types.BoundingBox
¶
BoundingBox(
dtype: DataType | str = "float32",
format: str | None = None,
bbox_type: str | None = None,
) -> Type
Stores an array of values specifying the bounding boxes of an image.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dtype
|
DataType | str
|
The datatype of values (default float32) |
'float32'
|
format
|
str | None
|
The bounding box format. Possible values: |
None
|
bbox_type
|
str | None
|
The pixel type. Possible values: |
None
|
Examples:
# Basic bounding boxes
ds.add_column("boxes", deeplake.types.BoundingBox())
# With specific format
ds.add_column("boxes", deeplake.types.BoundingBox(
format="ltwh" # left, top, width, height
))
deeplake.types.Struct
¶
Defines a custom datatype with specified keys.
See deeplake.types.Dict for a type that supports different key/value pairs per value.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
fields
|
dict[str, DataType | str]
|
A dict where the key is the name of the field, and the value is the datatype definition for it |
required |
Examples:
# Define fixed structure with specific types
ds.add_column("info", deeplake.types.Struct({
"id": deeplake.types.Int64(),
"name": "text",
"score": deeplake.types.Float32()
}))
# Add data
ds.append([{
"info": {
"id": 1,
"name": "sample",
"score": 0.95
}
}])
deeplake.types.Sequence
¶
Creates a sequence type that represents an ordered list of other data types.
A sequence maintains the order of its values, making it suitable for time-series data like videos (sequences of images).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
nested_type
|
DataType | str | Type
|
DataType | str | Type The data type of the values in the sequence. Can be any data type, not just primitive types. |
required |
Returns:
Name | Type | Description |
---|---|---|
Type |
Type
|
A new sequence data type. |
Examples:
Create a sequence of images:
# Sequence of images (e.g., video frames)
ds.add_column("frames", deeplake.types.Sequence(
deeplake.types.Image(sample_compression="jpeg")
))
# Sequence of embeddings
ds.add_column("token_embeddings", deeplake.types.Sequence(
deeplake.types.Embedding(768)
))
# Add data
ds.append([{
"frames": [frame1, frame2, frame3], # List of images
"token_embeddings": [emb1, emb2, emb3] # List of embeddings
}])