Skip to content

Deeplake Types

Deep Lake supports a wide variety of data types for your datasets.

When creating a new column the data type can be defined in multiple ways, which always convert to one of the below datatypes: - Call the below functions directly, e.g. deeplake.types.Text() - If the below function does not take arguments, simply pass the function, e.g. deeplake.types.Text - A string containing the type name, e.g. "text" - A standard python type str - A numpy type np.str_

ds.add_column("col1", deeplake.types.Text())
ds.add_column("col2", deeplake.types.Text)
ds.add_column("col2", "text")
ds.add_column("col3", str)
ds.add_column("col1", np.str_)

All Data Types

Note

For simplicity, all samples assume the following setup code:

import deeplake
from deeplake import types

ds = deeplake.create("mem://test")

deeplake.types.Array

Array(dtype: DataType | str, dimensions: int) -> DataType
Array(dtype: DataType | str, shape: list[int]) -> DataType
Array(
    dtype: DataType | str, dimensions: int, shape: list[int]
) -> DataType

A generic array of data.

Parameters:

Name Type Description Default
dtype DataType | str

The datatype of values in the array

required
dimensions int

The number of dimensions/axies in the array. Unlike specifying shape, there is no constraint on the size of each dimension.

required
shape list[int]

Constrain the size of each dimension in the array

required

Examples:

>>> # Create a three-dimensional array, where each dimension can have any number of elements
>>> ds.add_column("col1", types.Array("int32", dimensions=3))
>>>
>>> # Create a three-dimensional array, where each dimension has a known size
>>> ds.add_column("col2", types.Array(types.Float32(), shape=[50, 30, 768]))

deeplake.types.Binary module-attribute

deeplake.types.BinaryMask

BinaryMask(
    sample_compression: str | None = None,
    chunk_compression: str | None = None,
) -> Type

In binary mask, pixel value is a boolean for whether there is/is-not an object of a class present.

NOTE: Since binary masks often contain large amounts of data, it is recommended to compress them using lz4.

Parameters:

Name Type Description Default
sample_compression str | None

How to compress each row's value. Possible values: lz4, null (default: null)

None
chunk_compression str | None

How to compress all the values stored in a single file. Possible values: lz4, null (default: null)

None

Examples:

>>> ds.add_column("col1", types.BinaryMask(sample_compression="lz4"))
>>> ds.append(np.zeros((512, 512, 5), dtype="bool"))

deeplake.types.Bool

Bool() -> DataType

A boolean value

Examples:

>>> ds.add_column("col1", types.Bool)
>>> ds.add_column("col2", "bool")

deeplake.types.BoundingBox

BoundingBox(
    dtype: DataType | str = "float32",
    format: str | None = None,
    bbox_type: str | None = None,
) -> Type

Stores an array of values specifying the bounding boxes of an image.

Parameters:

Name Type Description Default
dtype DataType | str

The datatype of values (default float32)

'float32'
format str | None

The bounding box format. Possible values: ccwh, tlwh, tlbr, unknown

None
bbox_type str | None

The pixel type. Possible values: pixel, fractional

None

Examples:

>>> ds.add_column("col1", types.BoundingBox())
>>> ds.add_column("col2", types.BoundingBox(format="tlwh"))

deeplake.types.Dict

Dict() -> Type

Supports storing arbitrary key/value pairs in each row.

See deeplake.types.Struct for a type that supports defining allowed keys.

Examples:

>>> ds.add_column("col1", types.Dict)
>>>
>>> ds.append([{"col1", {"a": 1, "b": 2}}])
>>> ds.append([{"col1", {"b": 3, "c": 4}}])

deeplake.types.Embedding

Embedding(
    size: int,
    dtype: DataType | str = "float32",
    quantization: QuantizationType | None = None,
) -> Type

A single-dimensional embedding of a given length. See deeplake.types.Array for a multidimensional array.

Parameters:

Name Type Description Default
size int

The size of the embedding

required
dtype DataType | str

The datatype of the embedding. Defaults to float32

'float32'
quantization QuantizationType | None

How to compress the embeddings in the index. Default uses no compression, but can be set to deeplake.types.QuantizationType.Binary

None

Examples:

>>> ds.add_column("col1", types.Embedding(768))
>>> ds.add_column("col2", types.Embedding(768, quantization=types.QuantizationType.Binary))

deeplake.types.Float32

Float32() -> DataType

A 32-bit float value

Examples:

>>> ds.add_column("col1", types.Float)

deeplake.types.Float64

Float64() -> DataType

A 64-bit float value

Examples:

>>> ds.add_column("col1", types.Float64)

deeplake.types.Image

Image(
    dtype: DataType | str = "uint8",
    sample_compression: str = "png",
) -> Type

An image of a given format. The value returned will be a multidimensional array of values rather than the raw image bytes.

Available formats:

  • png (default)
  • apng
  • jpg / jpeg
  • tiff / tif
  • jpeg2000 / jp2
  • bmp
  • nii
  • nii.gz
  • dcm

Parameters:

Name Type Description Default
dtype DataType | str

The data type of the array elements to return

'uint8'
sample_compression str

The on-disk compression/format of the image

'png'

Examples:

>>> ds.add_column("col1", types.Sequence(types.Image))
>>> ds.add_column("col1", types.Sequence(types.Image(sample_compression="jpg")))

deeplake.types.Int16

Int16() -> DataType

A 16-bit integer value

Examples:

>>> ds.add_column("col1", types.Int16)

deeplake.types.Int32

Int32() -> DataType

A 32-bit integer value

Examples:

>>> ds.add_column("col1", types.Int32)

deeplake.types.Int64

Int64() -> DataType

A 64-bit integer value

Examples:

>>> ds.add_column("col1", types.Int64)

deeplake.types.Int8

Int8() -> DataType

An 8-bit integer value

Examples:

>>> ds.add_column("col1", types.Int8)

deeplake.types.SegmentMask

SegmentMask(
    dtype: DataType | str = "uint8",
    sample_compression: str | None = None,
    chunk_compression: str | None = None,
) -> Type

Segmentation masks are 2D representations of class labels where a numerical class value is encoded in an array of same shape as the image.

NOTE: Since segmentation masks often contain large amounts of data, it is recommended to compress them using lz4.

Parameters:

Name Type Description Default
sample_compression str | None

How to compress each row's value. Possible values: lz4, null (default: null)

None
chunk_compression str | None

How to compress all the values stored in a single file. Possible values: lz4, null (default: null)

None

Examples:

>>>  ds.add_column("col1", types.SegmentMask(sample_compression="lz4"))
>>>  ds.append("col1", np.zeros((512, 512)))

deeplake.types.Sequence

Sequence(nested_type: DataType | str | Type) -> Type

A sequence is a list of other data types, where there is a order to the values in the list.

For example, a video can be stored as a sequence of images to better capture the time-based ordering of the images rather than simply storing them as an Array

Parameters:

Name Type Description Default
nested_type DataType | str | Type

The data type of the values in the sequence. Can be any data type, not just primitive types.

required

Examples:

>>> ds.add_column("col1", types.Sequence(types.Image(sample_compression="jpeg")))

deeplake.types.Struct

Struct(fields: dict[str, DataType | str]) -> DataType

Defines a custom datatype with specified keys.

See deeplake.types.Dict for a type that supports different key/value pairs per value.

Parameters:

Name Type Description Default
fields dict[str, DataType | str]

A dict where the key is the name of the field, and the value is the datatype definition for it

required

Examples:

>>> ds.add_column("col1", types.Struct({
>>>    "field1": types.Int16(),
>>>    "field2": types.Text(),
>>> }))
>>>
>>> ds.append([{"col1": {"field1": 3, "field2": "a"}}])
>>> print(ds[0]["col1"]["field1"])

deeplake.types.Text

Text(index_type: str | TextIndexType | None = None) -> Type

Text data of arbitrary length.

Options for index_type are:

Parameters:

Name Type Description Default
index_type str | TextIndexType | None

How to index the data in the column for faster searching. Default is None meaning "do not index"

None

Examples:

>>> ds.add_column("col1", types.Text)
>>> ds.add_column("col2", "text")
>>> ds.add_column("col3", str)
>>> ds.add_column("col4", types.Text(index_type=types.Inverted))
>>> ds.add_column("col4", types.Text(index_type=types.BM25))

deeplake.types.UInt16

UInt16() -> DataType

An unsigned 16-bit integer value

Examples:

>>> ds.add_column("col1", types.UInt16)

deeplake.types.UInt32

UInt32() -> DataType

An unsigned 32-bit integer value

Examples:

>>> ds.add_column("col1", types.UInt16)

deeplake.types.UInt64

UInt64() -> DataType

An unsigned 64-bit integer value

Examples:

>>> ds.add_column("col1", types.UInt64)

deeplake.types.UInt8

UInt8() -> DataType

An unsigned 8-bit integer value

Examples:

>>> ds.add_column("col1", types.UInt16)

Text Index Types

deeplake.types.BM25 module-attribute

A BM25 based index of text data.

This index can be used with BM25_SIMILARITY(column, 'search text') in a TQL ORDER BY clause.

deeplake.types.Inverted module-attribute

Inverted: TextIndexType

A text index that supports keyword lookup.

This index can be used with CONTAINS(column, 'wanted_value').

Embedding Quantization

deeplake.types.QuantizationType.Binary class-attribute

Stores a binary quantized representation of the original embedding in the index rather than the a full copy of the embedding.

This slightly decreases accuracy of searches, while significantly improving query time.

Base Classes

deeplake.types.DataType

The base class all specific types extend from.

deeplake.types.Type

data_type property

data_type: DataType

default_format property

default_format: DataFormat

id property

id: str

The id (name) of the data type

is_sequence property

is_sequence: bool

kind property

kind: TypeKind

shape property

shape: list[int] | None

The shape of the data type if applicable. Otherwise none

deeplake.types.TextIndexType

Members:

Inverted

BM25

name property

name: str

value property

value: int

deeplake.types.QuantizationType

name property

name: str

value property

value: int