Skip to content

Deeplake Types

Deep Lake supports a wide variety of data types for your datasets.

When creating a new column the data type can be defined in multiple ways, which always convert to one of the below datatypes: - Call the below functions directly, e.g. deeplake.types.Text() - If the below function does not take arguments, simply pass the function, e.g. deeplake.types.Text - A string containing the type name, e.g. "text" - A standard python type str - A numpy type np.str_

ds.add_column("col1", deeplake.types.Text())
ds.add_column("col2", deeplake.types.Text)
ds.add_column("col2", "text")
ds.add_column("col3", str)
ds.add_column("col1", np.str_)

All Data Types

Note

For simplicity, all samples assume the following setup code:

import deeplake
from deeplake import types

ds = deeplake.create("mem://test")

deeplake.types.Array

Array(dtype: DataType | str, dimensions: int) -> DataType
Array(dtype: DataType | str, shape: list[int]) -> DataType
Array(
    dtype: DataType | str, dimensions: int, shape: list[int]
) -> DataType

Creates a generic array of data.

Parameters:

Name Type Description Default
dtype DataType | str

DataType | str The datatype of values in the array

required
dimensions int

int The number of dimensions/axes in the array. Unlike specifying shape, there is no constraint on the size of each dimension.

required
shape list[int]

list[int] Constrain the size of each dimension in the array

required

Returns:

Name Type Description
DataType DataType

A new array data type with the specified parameters.

Examples:

Create a three-dimensional array, where each dimension can have any number of elements:

ds.add_column("col1", types.Array("int32", dimensions=3))

Create a three-dimensional array, where each dimension has a known size:

ds.add_column("col2", types.Array(types.Float32(), shape=[50, 30, 768]))

deeplake.types.Binary module-attribute

Binary quantization type for embeddings.

This slightly decreases accuracy of searches while significantly improving query time by storing a binary quantized representation instead of the full embedding.

deeplake.types.BinaryMask

BinaryMask(
    sample_compression: str | None = None,
    chunk_compression: str | None = None,
) -> Type

In binary mask, pixel value is a boolean for whether there is/is-not an object of a class present.

NOTE: Since binary masks often contain large amounts of data, it is recommended to compress them using lz4.

Parameters:

Name Type Description Default
sample_compression str | None

How to compress each row's value. Possible values: lz4, null (default: null)

None
chunk_compression str | None

How to compress all the values stored in a single file. Possible values: lz4, null (default: null)

None

Examples:

ds.add_column("col1", types.BinaryMask(sample_compression="lz4"))
ds.append([{"col1": np.zeros((512, 512, 5), dtype="bool")}])

deeplake.types.Bool

Bool() -> DataType

Creates a boolean value type.

Returns:

Name Type Description
DataType DataType

A new boolean data type.

Examples:

Create columns with boolean type:

ds.add_column("col1", types.Bool)
ds.add_column("col2", "bool")

deeplake.types.BoundingBox

BoundingBox(
    dtype: DataType | str = "float32",
    format: str | None = None,
    bbox_type: str | None = None,
) -> Type

Stores an array of values specifying the bounding boxes of an image.

Parameters:

Name Type Description Default
dtype DataType | str

The datatype of values (default float32)

'float32'
format str | None

The bounding box format. Possible values: ccwh, ltwh, ltrb, unknown

None
bbox_type str | None

The pixel type. Possible values: pixel, fractional

None

Examples:

ds.add_column("col1", types.BoundingBox())
ds.add_column("col2", types.BoundingBox(format="ltwh"))

deeplake.types.ClassLabel

ClassLabel(dtype: DataType | str) -> Type

deeplake.types.Dict

Dict() -> Type

Creates a type that supports storing arbitrary key/value pairs in each row.

Returns:

Name Type Description
Type Type

A new dictionary data type.

See Also

:func:deeplake.types.Struct for a type that supports defining allowed keys.

Examples:

Create and use a dictionary column:

ds.add_column("col1", types.Dict)
ds.append([{"col1": {"a": 1, "b": 2}}])
ds.append([{"col1": {"b": 3, "c": 4}}])

deeplake.types.Embedding

Embedding(
    size: int | None = None,
    dtype: DataType | str = "float32",
    quantization: QuantizationType | None = None,
) -> Type

Creates a single-dimensional embedding of a given length.

Parameters:

Name Type Description Default
size int | None

int | None The size of the embedding

None
dtype DataType | str

DataType | str The datatype of the embedding. Defaults to float32

'float32'
quantization QuantizationType | None

QuantizationType | None How to compress the embeddings in the index. Default uses no compression, but can be set to :class:deeplake.types.QuantizationType.Binary

None

Returns:

Name Type Description
Type Type

A new embedding data type.

See Also

:func:deeplake.types.Array for a multidimensional array.

Examples:

Create embedding columns:

ds.add_column("col1", types.Embedding(768))
ds.add_column("col2", types.Embedding(768, quantization=types.QuantizationType.Binary))

deeplake.types.Float32

Float32() -> DataType

Creates a 32-bit float value type.

Returns:

Name Type Description
DataType DataType

A new 32-bit float data type.

Examples:

Create a column with 32-bit float type:

ds.add_column("col1", types.Float32)

deeplake.types.Float64

Float64() -> DataType

Creates a 64-bit float value type.

Returns:

Name Type Description
DataType DataType

A new 64-bit float data type.

Examples:

Create a column with 64-bit float type:

ds.add_column("col1", types.Float64)

deeplake.types.Image

Image(
    dtype: DataType | str = "uint8",
    sample_compression: str = "png",
) -> Type

An image of a given format. The value returned will be a multidimensional array of values rather than the raw image bytes.

Available formats:

  • png (default)
  • apng
  • jpg / jpeg
  • tiff / tif
  • jpeg2000 / jp2
  • bmp
  • nii
  • nii.gz
  • dcm

Parameters:

Name Type Description Default
dtype DataType | str

The data type of the array elements to return

'uint8'
sample_compression str

The on-disk compression/format of the image

'png'

Examples:

ds.add_column("col1", types.Image)
ds.add_column("col2", types.Image(sample_compression="jpg"))

deeplake.types.Int16

Int16() -> DataType

Creates a 16-bit integer value type.

Returns:

Name Type Description
DataType DataType

A new 16-bit integer data type.

Examples:

Create a column with 16-bit integer type:

ds.add_column("col1", types.Int16)

deeplake.types.Int32

Int32() -> DataType

Creates a 32-bit integer value type.

Returns:

Name Type Description
DataType DataType

A new 32-bit integer data type.

Examples:

Create a column with 32-bit integer type:

ds.add_column("col1", types.Int32)

deeplake.types.Int64

Int64() -> DataType

Creates a 64-bit integer value type.

Returns:

Name Type Description
DataType DataType

A new 64-bit integer data type.

Examples:

Create a column with 64-bit integer type:

ds.add_column("col1", types.Int64)

deeplake.types.Int8

Int8() -> DataType

Creates an 8-bit integer value type.

Returns:

Name Type Description
DataType DataType

A new 8-bit integer data type.

Examples:

Create a column with 8-bit integer type:

ds.add_column("col1", types.Int8)

Link(type: Type) -> Type

A link to an external resource. The value returned will be a reference to the external resource rather than the raw data.

Parameters:

Name Type Description Default
type Type

The type of the linked data

required

Examples:

ds.add_column("col1", types.Link(types.Image()))

deeplake.types.Polygon

Polygon() -> Type

deeplake.types.SegmentMask

SegmentMask(
    dtype: DataType | str = "uint8",
    sample_compression: str | None = None,
    chunk_compression: str | None = None,
) -> Type

Segmentation masks are 2D representations of class labels where a numerical class value is encoded in an array of same shape as the image.

NOTE: Since segmentation masks often contain large amounts of data, it is recommended to compress them using lz4.

Parameters:

Name Type Description Default
sample_compression str | None

How to compress each row's value. Possible values: lz4, null (default: null)

None
chunk_compression str | None

How to compress all the values stored in a single file. Possible values: lz4, null (default: null)

None

Examples:

ds.add_column("col1", types.SegmentMask(sample_compression="lz4"))
ds.append([{"col1": np.zeros((512, 512, 3))}])

deeplake.types.Sequence

Sequence(nested_type: DataType | str | Type) -> Type

Creates a sequence type that represents an ordered list of other data types.

A sequence maintains the order of its values, making it suitable for time-series data like videos (sequences of images).

Parameters:

Name Type Description Default
nested_type DataType | str | Type

DataType | str | Type The data type of the values in the sequence. Can be any data type, not just primitive types.

required

Returns:

Name Type Description
Type Type

A new sequence data type.

Examples:

Create a sequence of images:

ds.add_column("col1", types.Sequence(types.Image(sample_compression="jpg")))

deeplake.types.Struct

Struct(fields: dict[str, DataType | str]) -> DataType

Defines a custom datatype with specified keys.

See deeplake.types.Dict for a type that supports different key/value pairs per value.

Parameters:

Name Type Description Default
fields dict[str, DataType | str]

A dict where the key is the name of the field, and the value is the datatype definition for it

required

Examples:

ds.add_column("col1", types.Struct({
   "field1": types.Int16(),
   "field2": "text",
}))

ds.append([{"col1": {"field1": 3, "field2": "a"}}])
print(ds[0]["col1"]["field1"]) # Output: 3

deeplake.types.Text

Text(index_type: str | TextIndexType | None = None) -> Type

Creates a text data type of arbitrary length.

Parameters:

Name Type Description Default
index_type str | TextIndexType | None

str | TextIndexType | None How to index the data in the column for faster searching. Options are:

  • :class:deeplake.types.Inverted
  • :class:deeplake.types.BM25

Default is None meaning "do not index"

None

Returns:

Name Type Description
Type Type

A new text data type.

Examples:

Create text columns with different configurations:

ds.add_column("col1", types.Text)
ds.add_column("col2", "text")
ds.add_column("col3", str)
ds.add_column("col4", types.Text(index_type=types.Inverted))
ds.add_column("col5", types.Text(index_type=types.BM25))

deeplake.types.UInt16

UInt16() -> DataType

An unsigned 16-bit integer value

Examples:

ds.add_column("col1", types.UInt16)

deeplake.types.UInt32

UInt32() -> DataType

An unsigned 32-bit integer value

Examples:

ds.add_column("col1", types.UInt16)

deeplake.types.UInt64

UInt64() -> DataType

An unsigned 64-bit integer value

Examples:

ds.add_column("col1", types.UInt64)

deeplake.types.UInt8

UInt8() -> DataType

An unsigned 8-bit integer value

Examples:

ds.add_column("col1", types.UInt16)

Text Index Types

deeplake.types.BM25 module-attribute

A BM25-based index of text data.

This index can be used with BM25_SIMILARITY(column, 'search text') in a TQL ORDER BY clause.

See Also

BM25 Algorithm <https://en.wikipedia.org/wiki/Okapi_BM25>_

deeplake.types.Inverted module-attribute

Inverted: TextIndexType

A text index that supports keyword lookup.

This index can be used with CONTAINS(column, 'wanted_value').

Embedding Quantization

deeplake.types.QuantizationType.Binary class-attribute

Base Classes

deeplake.types.DataType

The base class all specific types extend from.

This class provides the foundation for all data types in the deeplake.

deeplake.types.Type

Base class for all complex data types in the deeplake.

This class extends DataType to provide additional functionality for complex types like images, embeddings, and sequences.

data_type property

data_type: DataType

Returns:

Name Type Description
DataType DataType

The underlying data type of this type.

default_format property

default_format: DataFormat

Returns:

Name Type Description
DataFormat DataFormat

The default format used for this type.

id property

id: str

Returns:

Name Type Description
str str

The id (name) of the data type.

is_image property

is_image: bool

Returns:

Name Type Description
bool bool

True if this type is an image, False otherwise.

is_link: bool

Returns:

Name Type Description
bool bool

True if this type is a link, False otherwise.

is_segment_mask property

is_segment_mask: bool

Returns:

Name Type Description
bool bool

True if this type is a segment mask, False otherwise.

is_sequence property

is_sequence: bool

Returns:

Name Type Description
bool bool

True if this type is a sequence, False otherwise.

kind property

kind: TypeKind

Returns:

Name Type Description
TypeKind TypeKind

The kind of this type.

shape property

shape: list[int] | None

Returns:

Type Description
list[int] | None

list[int] | None: The shape of the data type if applicable, otherwise None.

deeplake.types.TextIndexType

Enumeration of available text indexing types.

Members

Inverted: A text index that supports keyword lookup. Can be used with CONTAINS(column, 'wanted_value'). BM25: A BM25-based index of text data. Can be used with BM25_SIMILARITY(column, 'search text') in a TQL ORDER BY clause.

name property

name: str

Returns:

Name Type Description
str str

The name of the text index type.

value property

value: int

Returns:

Name Type Description
int int

The integer value of the text index type.

deeplake.types.QuantizationType

Enumeration of available quantization types for embeddings.

Members

Binary: Stores a binary quantized representation of the original embedding in the index rather than a full copy of the embedding. This slightly decreases accuracy of searches, while significantly improving query time.

name property

name: str

Returns:

Name Type Description
str str

The name of the quantization type.

value property

value: int

Returns:

Name Type Description
int int

The integer value of the quantization type.