Deeplake Types
Deep Lake supports a wide variety of data types for your datasets.
When creating a new column the data type can be defined in multiple ways, which always convert to one of the below datatypes:
- Call the below functions directly, e.g. deeplake.types.Text()
- If the below function does not take arguments, simply pass the function, e.g. deeplake.types.Text
- A string containing the type name, e.g. "text"
- A standard python type str
- A numpy type np.str_
ds.add_column("col1", deeplake.types.Text())
ds.add_column("col2", deeplake.types.Text)
ds.add_column("col2", "text")
ds.add_column("col3", str)
ds.add_column("col1", np.str_)
All Data Types
Note
For simplicity, all samples assume the following setup code:
deeplake.types.Array
Creates a generic array of data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dtype
|
DataType | str
|
DataType | str The datatype of values in the array |
required |
dimensions
|
int
|
int
The number of dimensions/axes in the array. Unlike specifying |
required |
shape
|
list[int]
|
list[int] Constrain the size of each dimension in the array |
required |
Returns:
Name | Type | Description |
---|---|---|
DataType |
DataType
|
A new array data type with the specified parameters. |
Examples:
Create a three-dimensional array, where each dimension can have any number of elements:
Create a three-dimensional array, where each dimension has a known size:
deeplake.types.Binary
module-attribute
Binary: QuantizationType
Binary quantization type for embeddings.
This slightly decreases accuracy of searches while significantly improving query time by storing a binary quantized representation instead of the full embedding.
deeplake.types.BinaryMask
BinaryMask(
sample_compression: str | None = None,
chunk_compression: str | None = None,
) -> Type
In binary mask, pixel value is a boolean for whether there is/is-not an object of a class present.
NOTE: Since binary masks often contain large amounts of data, it is recommended to compress them using lz4.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sample_compression
|
str | None
|
How to compress each row's value. Possible values: lz4, null (default: null) |
None
|
chunk_compression
|
str | None
|
How to compress all the values stored in a single file. Possible values: lz4, null (default: null) |
None
|
Examples:
deeplake.types.Bool
Bool() -> DataType
Creates a boolean value type.
Returns:
Name | Type | Description |
---|---|---|
DataType |
DataType
|
A new boolean data type. |
Examples:
Create columns with boolean type:
deeplake.types.BoundingBox
BoundingBox(
dtype: DataType | str = "float32",
format: str | None = None,
bbox_type: str | None = None,
) -> Type
Stores an array of values specifying the bounding boxes of an image.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dtype
|
DataType | str
|
The datatype of values (default float32) |
'float32'
|
format
|
str | None
|
The bounding box format. Possible values: |
None
|
bbox_type
|
str | None
|
The pixel type. Possible values: |
None
|
Examples:
deeplake.types.Dict
Dict() -> Type
Creates a type that supports storing arbitrary key/value pairs in each row.
Returns:
Name | Type | Description |
---|---|---|
Type |
Type
|
A new dictionary data type. |
See Also
:func:deeplake.types.Struct
for a type that supports defining allowed keys.
Examples:
Create and use a dictionary column:
deeplake.types.Embedding
Embedding(
size: int | None = None,
dtype: DataType | str = "float32",
quantization: QuantizationType | None = None,
) -> Type
Creates a single-dimensional embedding of a given length.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
size
|
int | None
|
int | None The size of the embedding |
None
|
dtype
|
DataType | str
|
DataType | str The datatype of the embedding. Defaults to float32 |
'float32'
|
quantization
|
QuantizationType | None
|
QuantizationType | None
How to compress the embeddings in the index. Default uses no compression,
but can be set to :class: |
None
|
Returns:
Name | Type | Description |
---|---|---|
Type |
Type
|
A new embedding data type. |
See Also
:func:deeplake.types.Array
for a multidimensional array.
Examples:
Create embedding columns:
deeplake.types.Float32
Float32() -> DataType
Creates a 32-bit float value type.
Returns:
Name | Type | Description |
---|---|---|
DataType |
DataType
|
A new 32-bit float data type. |
Examples:
Create a column with 32-bit float type:
deeplake.types.Float64
Float64() -> DataType
Creates a 64-bit float value type.
Returns:
Name | Type | Description |
---|---|---|
DataType |
DataType
|
A new 64-bit float data type. |
Examples:
Create a column with 64-bit float type:
deeplake.types.Image
An image of a given format. The value returned will be a multidimensional array of values rather than the raw image bytes.
Available formats:
- png (default)
- apng
- jpg / jpeg
- tiff / tif
- jpeg2000 / jp2
- bmp
- nii
- nii.gz
- dcm
Parameters:
Name | Type | Description | Default |
---|---|---|---|
dtype
|
DataType | str
|
The data type of the array elements to return |
'uint8'
|
sample_compression
|
str
|
The on-disk compression/format of the image |
'png'
|
Examples:
deeplake.types.Int16
Int16() -> DataType
Creates a 16-bit integer value type.
Returns:
Name | Type | Description |
---|---|---|
DataType |
DataType
|
A new 16-bit integer data type. |
Examples:
Create a column with 16-bit integer type:
deeplake.types.Int32
Int32() -> DataType
Creates a 32-bit integer value type.
Returns:
Name | Type | Description |
---|---|---|
DataType |
DataType
|
A new 32-bit integer data type. |
Examples:
Create a column with 32-bit integer type:
deeplake.types.Int64
Int64() -> DataType
Creates a 64-bit integer value type.
Returns:
Name | Type | Description |
---|---|---|
DataType |
DataType
|
A new 64-bit integer data type. |
Examples:
Create a column with 64-bit integer type:
deeplake.types.Int8
Int8() -> DataType
Creates an 8-bit integer value type.
Returns:
Name | Type | Description |
---|---|---|
DataType |
DataType
|
A new 8-bit integer data type. |
Examples:
Create a column with 8-bit integer type:
deeplake.types.Link
A link to an external resource. The value returned will be a reference to the external resource rather than the raw data.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
type
|
Type
|
The type of the linked data |
required |
Examples:
deeplake.types.SegmentMask
SegmentMask(
dtype: DataType | str = "uint8",
sample_compression: str | None = None,
chunk_compression: str | None = None,
) -> Type
Segmentation masks are 2D representations of class labels where a numerical class value is encoded in an array of same shape as the image.
NOTE: Since segmentation masks often contain large amounts of data, it is recommended to compress them using lz4.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sample_compression
|
str | None
|
How to compress each row's value. Possible values: lz4, null (default: null) |
None
|
chunk_compression
|
str | None
|
How to compress all the values stored in a single file. Possible values: lz4, null (default: null) |
None
|
Examples:
deeplake.types.Sequence
Creates a sequence type that represents an ordered list of other data types.
A sequence maintains the order of its values, making it suitable for time-series data like videos (sequences of images).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
nested_type
|
DataType | str | Type
|
DataType | str | Type The data type of the values in the sequence. Can be any data type, not just primitive types. |
required |
Returns:
Name | Type | Description |
---|---|---|
Type |
Type
|
A new sequence data type. |
Examples:
Create a sequence of images:
deeplake.types.Struct
Defines a custom datatype with specified keys.
See deeplake.types.Dict for a type that supports different key/value pairs per value.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
fields
|
dict[str, DataType | str]
|
A dict where the key is the name of the field, and the value is the datatype definition for it |
required |
Examples:
deeplake.types.Text
Text(index_type: str | TextIndexType | None = None) -> Type
Creates a text data type of arbitrary length.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
index_type
|
str | TextIndexType | None
|
str | TextIndexType | None How to index the data in the column for faster searching. Options are:
Default is |
None
|
Returns:
Name | Type | Description |
---|---|---|
Type |
Type
|
A new text data type. |
Examples:
Create text columns with different configurations:
deeplake.types.UInt16
UInt16() -> DataType
deeplake.types.UInt32
UInt32() -> DataType
deeplake.types.UInt64
UInt64() -> DataType
deeplake.types.UInt8
UInt8() -> DataType
Text Index Types
deeplake.types.BM25
module-attribute
BM25: TextIndexType
A BM25-based index of text data.
This index can be used with BM25_SIMILARITY(column, 'search text')
in a TQL ORDER BY
clause.
See Also
BM25 Algorithm <https://en.wikipedia.org/wiki/Okapi_BM25>
_
deeplake.types.Inverted
module-attribute
Inverted: TextIndexType
A text index that supports keyword lookup.
This index can be used with CONTAINS(column, 'wanted_value')
.
Embedding Quantization
Base Classes
deeplake.types.DataType
The base class all specific types extend from.
This class provides the foundation for all data types in the deeplake.
deeplake.types.Type
Base class for all complex data types in the deeplake.
This class extends DataType to provide additional functionality for complex types like images, embeddings, and sequences.
data_type
property
data_type: DataType
Returns:
Name | Type | Description |
---|---|---|
DataType |
DataType
|
The underlying data type of this type. |
default_format
property
Returns:
Name | Type | Description |
---|---|---|
DataFormat |
DataFormat
|
The default format used for this type. |
is_image
property
Returns:
Name | Type | Description |
---|---|---|
bool |
bool
|
True if this type is an image, False otherwise. |
is_link
property
Returns:
Name | Type | Description |
---|---|---|
bool |
bool
|
True if this type is a link, False otherwise. |
is_segment_mask
property
Returns:
Name | Type | Description |
---|---|---|
bool |
bool
|
True if this type is a segment mask, False otherwise. |
is_sequence
property
Returns:
Name | Type | Description |
---|---|---|
bool |
bool
|
True if this type is a sequence, False otherwise. |
kind
property
Returns:
Name | Type | Description |
---|---|---|
TypeKind |
TypeKind
|
The kind of this type. |
deeplake.types.TextIndexType
Enumeration of available text indexing types.
Members
Inverted:
A text index that supports keyword lookup. Can be used with CONTAINS(column, 'wanted_value')
.
BM25:
A BM25-based index of text data. Can be used with BM25_SIMILARITY(column, 'search text')
in a TQL ORDER BY
clause.
deeplake.types.QuantizationType
Enumeration of available quantization types for embeddings.
Members
Binary: Stores a binary quantized representation of the original embedding in the index rather than a full copy of the embedding. This slightly decreases accuracy of searches, while significantly improving query time.