Deeplake Types
Deep Lake supports a wide variety of data types for your datasets.
When creating a new column the data type can be defined in multiple ways, which always convert to one of the below datatypes:
- Call the below functions directly, e.g. deeplake.types.Text()
- If the below function does not take arguments, simply pass the function, e.g. deeplake.types.Text
- A string containing the type name, e.g. "text"
- A standard python type str
- A numpy type np.str_
ds.add_column("col1", deeplake.types.Text())
ds.add_column("col2", deeplake.types.Text)
ds.add_column("col2", "text")
ds.add_column("col3", str)
ds.add_column("col1", np.str_)
All Data Types
Note
For simplicity, all samples assume the following setup code:
deeplake.types.Array
A generic array of data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dtype
|
DataType | str
|
The datatype of values in the array |
required |
dimensions
|
int
|
The number of dimensions/axies in the array. Unlike specifying |
required |
shape
|
list[int]
|
Constrain the size of each dimension in the array |
required |
Examples:
>>> # Create a three-dimensional array, where each dimension can have any number of elements
>>> ds.add_column("col1", types.Array("int32", dimensions=3))
>>>
>>> # Create a three-dimensional array, where each dimension has a known size
>>> ds.add_column("col2", types.Array(types.Float32(), shape=[50, 30, 768]))
deeplake.types.BinaryMask
BinaryMask(
sample_compression: str | None = None,
chunk_compression: str | None = None,
) -> Type
In binary mask, pixel value is a boolean for whether there is/is-not an object of a class present.
NOTE: Since binary masks often contain large amounts of data, it is recommended to compress them using lz4.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sample_compression
|
str | None
|
How to compress each row's value. Possible values: lz4, null (default: null) |
None
|
chunk_compression
|
str | None
|
How to compress all the values stored in a single file. Possible values: lz4, null (default: null) |
None
|
Examples:
deeplake.types.Bool
Bool() -> DataType
deeplake.types.BoundingBox
BoundingBox(
dtype: DataType | str = "float32",
format: str | None = None,
bbox_type: str | None = None,
) -> Type
Stores an array of values specifying the bounding boxes of an image.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dtype
|
DataType | str
|
The datatype of values (default float32) |
'float32'
|
format
|
str | None
|
The bounding box format. Possible values: |
None
|
bbox_type
|
str | None
|
The pixel type. Possible values: |
None
|
Examples:
deeplake.types.Dict
Dict() -> Type
Supports storing arbitrary key/value pairs in each row.
See deeplake.types.Struct for a type that supports defining allowed keys.
Examples:
deeplake.types.Embedding
Embedding(
size: int,
dtype: DataType | str = "float32",
quantization: QuantizationType | None = None,
) -> Type
A single-dimensional embedding of a given length. See deeplake.types.Array for a multidimensional array.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
size
|
int
|
The size of the embedding |
required |
dtype
|
DataType | str
|
The datatype of the embedding. Defaults to float32 |
'float32'
|
quantization
|
QuantizationType | None
|
How to compress the embeddings in the index. Default uses no compression, but can be set to deeplake.types.QuantizationType.Binary |
None
|
Examples:
deeplake.types.Float32
Float32() -> DataType
deeplake.types.Float64
Float64() -> DataType
deeplake.types.Image
An image of a given format. The value returned will be a multidimensional array of values rather than the raw image bytes.
Available formats:
- png (default)
- apng
- jpg / jpeg
- tiff / tif
- jpeg2000 / jp2
- bmp
- nii
- nii.gz
- dcm
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dtype
|
DataType | str
|
The data type of the array elements to return |
'uint8'
|
sample_compression
|
str
|
The on-disk compression/format of the image |
'png'
|
Examples:
deeplake.types.Int16
Int16() -> DataType
deeplake.types.Int32
Int32() -> DataType
deeplake.types.Int64
Int64() -> DataType
deeplake.types.Int8
Int8() -> DataType
deeplake.types.SegmentMask
SegmentMask(
dtype: DataType | str = "uint8",
sample_compression: str | None = None,
chunk_compression: str | None = None,
) -> Type
Segmentation masks are 2D representations of class labels where a numerical class value is encoded in an array of same shape as the image.
NOTE: Since segmentation masks often contain large amounts of data, it is recommended to compress them using lz4.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sample_compression
|
str | None
|
How to compress each row's value. Possible values: lz4, null (default: null) |
None
|
chunk_compression
|
str | None
|
How to compress all the values stored in a single file. Possible values: lz4, null (default: null) |
None
|
Examples:
deeplake.types.Sequence
A sequence is a list of other data types, where there is a order to the values in the list.
For example, a video can be stored as a sequence of images to better capture the time-based ordering of the images rather than simply storing them as an Array
Parameters:
Name | Type | Description | Default |
---|---|---|---|
nested_type
|
DataType | str | Type
|
The data type of the values in the sequence. Can be any data type, not just primitive types. |
required |
Examples:
deeplake.types.Struct
Defines a custom datatype with specified keys.
See deeplake.types.Dict for a type that supports different key/value pairs per value.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
fields
|
dict[str, DataType | str]
|
A dict where the key is the name of the field, and the value is the datatype definition for it |
required |
Examples:
deeplake.types.Text
Text(index_type: str | TextIndexType | None = None) -> Type
Text data of arbitrary length.
Options for index_type are:
Parameters:
Name | Type | Description | Default |
---|---|---|---|
index_type
|
str | TextIndexType | None
|
How to index the data in the column for faster searching. Default is |
None
|
Examples:
deeplake.types.UInt16
UInt16() -> DataType
deeplake.types.UInt32
UInt32() -> DataType
deeplake.types.UInt64
UInt64() -> DataType
deeplake.types.UInt8
UInt8() -> DataType
Text Index Types
deeplake.types.BM25
module-attribute
BM25: TextIndexType
A BM25 based index of text data.
This index can be used with BM25_SIMILARITY(column, 'search text')
in a TQL ORDER BY
clause.
deeplake.types.Inverted
module-attribute
Inverted: TextIndexType
A text index that supports keyword lookup.
This index can be used with CONTAINS(column, 'wanted_value')
.
Embedding Quantization
deeplake.types.QuantizationType.Binary
class-attribute
Binary: QuantizationType
Stores a binary quantized representation of the original embedding in the index rather than the a full copy of the embedding.
This slightly decreases accuracy of searches, while significantly improving query time.
Base Classes
deeplake.types.DataType
The base class all specific types extend from.