Skip to content

Types

Deep Lake provides a comprehensive type system designed for efficient data storage and retrieval. The type system includes basic numeric types as well as specialized types optimized for common data formats like images, embeddings, and text.

Each type can be specified either using the full type class or a string shorthand:

# Using type class
ds.add_column("col1", deeplake.types.Float32())

# Using string shorthand
ds.add_column("col2", "float32")

Types determine:

  • How data is stored and compressed
  • What operations are available
  • How the data can be queried and indexed
  • Integration with external libraries and frameworks

Numeric Types

All basic numeric types:

import deeplake

# Integers
ds.add_column("int8", deeplake.types.Int8())      # -128 to 127
ds.add_column("int16", deeplake.types.Int16())    # -32,768 to 32,767
ds.add_column("int32", deeplake.types.Int32())    # -2^31 to 2^31-1
ds.add_column("int64", deeplake.types.Int64())    # -2^63 to 2^63-1

# Unsigned Integers
ds.add_column("uint8", deeplake.types.UInt8())    # 0 to 255
ds.add_column("uint16", deeplake.types.UInt16())  # 0 to 65,535
ds.add_column("uint32", deeplake.types.UInt32())  # 0 to 2^32-1
ds.add_column("uint64", deeplake.types.UInt64())  # 0 to 2^64-1

# Floating Point
ds.add_column("float32", deeplake.types.Float32())
ds.add_column("float64", deeplake.types.Float64())

Numeric Indexing

Numeric columns support indexing for efficient comparison operations:

# Create numeric column with inverted index for range queries
ds.add_column("timestamp", deeplake.types.UInt64())

# Create the index manually
ds["timestamp"].create_index(
    deeplake.types.NumericIndex(deeplake.types.NumericIndexEnumType.Inverted)
)

# Now you can use efficient comparison operations in queries:
# - Greater than: WHERE timestamp > 1609459200
# - Less than: WHERE timestamp < 1640995200  
# - Between: WHERE timestamp BETWEEN 1609459200 AND 1640995200
# - Value list: WHERE timestamp IN (1609459200, 1640995200)

deeplake.types.Audio

Audio(
    dtype: DataType | str = "uint8",
    sample_compression: str = "mp3",
) -> Type

Creates an audio data type.

Parameters:

Name Type Description Default
dtype DataType | str

DataType | str The datatype of the audio samples. Defaults to "uint8".

'uint8'
sample_compression str

str The compression format for the audio samples wav or mp3. Defaults to "mp3".

'mp3'

Returns:

Name Type Description
Type Type

A new audio data type.

Examples:

Create an audio column with default settings:

ds.add_column("col1", types.Audio())

Create an audio column with specific sample compression:

ds.add_column("col2", types.Audio(sample_compression="wav"))

# Basic audio storage
ds.add_column("audio", deeplake.types.Audio())

# WAV format
ds.add_column("audio", deeplake.types.Audio(
    sample_compression="wav"
))

# MP3 compression (default)
ds.add_column("audio", deeplake.types.Audio(
    sample_compression="mp3"
))

# With specific dtype
ds.add_column("audio", deeplake.types.Audio(
    dtype="uint8",
    sample_compression="wav"
))

# Audio with Link for external references
ds.add_column("audio_links", deeplake.types.Link(
    deeplake.types.Audio(sample_compression="mp3")
))

deeplake.types.Image

Image(
    dtype: DataType | str = "uint8",
    sample_compression: str = "png",
) -> Type

An image of a given format. The value returned will be a multidimensional array of values rather than the raw image bytes.

Available formats:

  • png (default)
  • apng
  • jpg / jpeg
  • tiff / tif
  • jpeg2000 / jp2
  • bmp
  • nii
  • nii.gz
  • dcm

Parameters:

Name Type Description Default
dtype DataType | str

The data type of the array elements to return

'uint8'
sample_compression str

The on-disk compression/format of the image

'png'

Examples:

ds.add_column("col1", types.Image)
ds.add_column("col2", types.Image(sample_compression="jpg"))
# Basic image storage
ds.add_column("images", deeplake.types.Image())

# JPEG compression
ds.add_column("images", deeplake.types.Image(
    sample_compression="jpeg"
))

# With specific dtype
ds.add_column("images", deeplake.types.Image(
    dtype="uint8"  # 8-bit RGB
))

deeplake.types.Embedding

Embedding(
    size: int | None = None,
    dtype: DataType | str = "float32",
    index_type: (
        EmbeddingIndexType | QuantizationType | None
    ) = None,
) -> Type

Creates a single-dimensional embedding of a given length.

Parameters:

Name Type Description Default
size int | None

int | None The size of the embedding

None
dtype DataType | str

DataType | str The datatype of the embedding. Defaults to float32

'float32'
quantization

QuantizationType | None How to compress the embeddings in the index. Default uses no compression, but can be set to :class:deeplake.types.QuantizationType.Binary

required

Returns:

Name Type Description
Type Type

A new embedding data type.

See Also

:func:deeplake.types.Array for a multidimensional array.

Examples:

Create embedding columns:

ds.add_column("col1", types.Embedding(768))
ds.add_column("col2", types.Embedding(768, index_type=types.EmbeddingIndex(types.ClusteredQuantized)))

# Basic embeddings
ds.add_column("embeddings", deeplake.types.Embedding(768))

# With binary quantization for faster search
ds.add_column("embeddings", deeplake.types.Embedding(
    size=768,
    index_type=deeplake.types.EmbeddingIndex(deeplake.types.ClusteredQuantized)
))

# Custom dtype
ds.add_column("embeddings", deeplake.types.Embedding(
    size=768,
    dtype="float32"
))

deeplake.types.Text

Text(
    index_type: (
        str | TextIndexEnumType | TextIndexType | None
    ) = None,
) -> Type

Creates a text data type of arbitrary length.

Parameters:

Name Type Description Default
index_type str | TextIndexEnumType | TextIndexType | None

str | TextIndexEnumType | TextIndexType | None How to index the data in the column for faster searching. Options are:

  • :class:deeplake.types.Inverted
  • :class:deeplake.types.BM25

Default is None meaning "do not index"

None

Returns:

Name Type Description
Type Type

A new text data type.

Examples:

Create text columns with different configurations:

ds.add_column("col1", types.Text)
ds.add_column("col2", "text")
ds.add_column("col3", str)
ds.add_column("col4", types.Text(index_type=types.Inverted))
ds.add_column("col5", types.Text(index_type=types.BM25))

# Basic text
ds.add_column("text", deeplake.types.Text())

# Text with BM25 index for semantic search
ds.add_column("text2", deeplake.types.Text(
    index_type=deeplake.types.BM25
))

# Text with inverted index for keyword search
ds.add_column("text3", deeplake.types.Text(
    index_type=deeplake.types.Inverted
))

# Text with exact index for whole text matching
ds.add_column("text4", deeplake.types.Text(
    index_type=deeplake.types.Exact
))

deeplake.types.Dict

Dict() -> Type

Creates a type that supports storing arbitrary key/value pairs in each row.

Returns:

Name Type Description
Type Type

A new dictionary data type.

See Also

:func:deeplake.types.Struct for a type that supports defining allowed keys.

Examples:

Create and use a dictionary column:

ds.add_column("col1", types.Dict)
ds.append([{"col1": {"a": 1, "b": 2}}])
ds.append([{"col1": {"b": 3, "c": 4}}])

# Store arbitrary key/value pairs
ds.add_column("metadata", deeplake.types.Dict())

# Add data
ds.append([{
    "metadata": {
        "timestamp": "2024-01-01",
        "source": "camera_1",
        "settings": {"exposure": 1.5}
    }
}])

deeplake.types.Array

Array(dtype: DataType | str, dimensions: int) -> DataType
Array(dtype: DataType | str, shape: list[int]) -> DataType
Array(dtype: DataType | str) -> DataType
Array(
    dtype: DataType | str,
    dimensions: int | None,
    shape: list[int] | None,
) -> DataType

Creates a generic array of data.

Parameters:

Name Type Description Default
dtype DataType | str

DataType | str The datatype of values in the array

required
dimensions int | None

int | None The number of dimensions/axes in the array. Unlike specifying shape, there is no constraint on the size of each dimension.

required
shape list[int] | None

list[int] | None Constrain the size of each dimension in the array

required

Returns:

Name Type Description
DataType DataType

A new array data type with the specified parameters.

Examples:

Create a three-dimensional array, where each dimension can have any number of elements:

ds.add_column("col1", types.Array("int32", dimensions=3))

Create a three-dimensional array, where each dimension has a known size:

ds.add_column("col2", types.Array(types.Float32(), shape=[50, 30, 768]))

# Fixed-size array
ds.add_column("features", deeplake.types.Array(
    "float32",
    shape=[512]  # Enforces size
))

# Variable-size array
ds.add_column("sequences", deeplake.types.Array(
    "int32",
    dimensions=1  # Allows any size
))

Numeric Indexes

Deep Lake supports indexing numeric columns for faster lookup operations:

from deeplake.types import NumericIndex, NumericIndexEnumType
# Add numeric column and create an inverted index
ds.add_column("scores", "float32")
ds["scores"].create_index(NumericIndex(NumericIndexEnumType.Inverted))

# Use with TQL for efficient filtering
results = ds.query("SELECT * WHERE CONTAINS(scores, 0.95)")

deeplake.types.Bytes

Bytes() -> DataType

Creates a byte array value type. This is useful for storing raw binary data.

Returns:

Name Type Description
DataType DataType

A new byte array data type.

Examples:

Create columns with byte array type:

ds.add_column("col1", types.Bytes)
ds.add_column("col2", "bytes")

Append raw binary data to a byte array column:

ds.append([{"col1": b"hello", "col2": b"world"}])

deeplake.types.BinaryMask

BinaryMask(
    sample_compression: str | None = None,
    chunk_compression: str | None = None,
) -> Type

In binary mask, pixel value is a boolean for whether there is/is-not an object of a class present.

NOTE: Since binary masks often contain large amounts of data, it is recommended to compress them using lz4.

Parameters:

Name Type Description Default
sample_compression str | None

How to compress each row's value. Possible values: lz4, null (default: null)

None
chunk_compression str | None

How to compress all the values stored in a single file. Possible values: lz4, null (default: null)

None

Examples:

ds.add_column("col1", types.BinaryMask(sample_compression="lz4"))
ds.append([{"col1": np.zeros((512, 512, 5), dtype="bool")}])
# Basic binary mask
ds.add_column("masks", deeplake.types.BinaryMask())

# With compression
ds.add_column("masks", deeplake.types.BinaryMask(
    sample_compression="lz4"
))

deeplake.types.SegmentMask

SegmentMask(
    dtype: DataType | str = "uint8",
    sample_compression: str | None = None,
    chunk_compression: str | None = None,
) -> Type

Segmentation masks are 2D representations of class labels where a numerical class value is encoded in an array of same shape as the image.

NOTE: Since segmentation masks often contain large amounts of data, it is recommended to compress them using lz4.

Parameters:

Name Type Description Default
sample_compression str | None

How to compress each row's value. Possible values: lz4, null (default: null)

None
chunk_compression str | None

How to compress all the values stored in a single file. Possible values: lz4, null (default: null)

None

Examples:

ds.add_column("col1", types.SegmentMask(sample_compression="lz4"))
ds.append([{"col1": np.zeros((512, 512, 3))}])
# Basic segmentation mask
ds.add_column("segmentation", deeplake.types.SegmentMask())

# With compression
ds.add_column("segmentation", deeplake.types.SegmentMask(
    dtype="uint8",
    sample_compression="lz4"
))

deeplake.types.BoundingBox

BoundingBox(
    dtype: DataType | str = "float32",
    format: str | None = None,
    bbox_type: str | None = None,
) -> Type

Stores an array of values specifying the bounding boxes of an image.

Parameters:

Name Type Description Default
dtype DataType | str

The datatype of values (default float32)

'float32'
format str | None

The bounding box format. Possible values: ccwh, ltwh, ltrb, unknown

None
bbox_type str | None

The pixel type. Possible values: pixel, fractional

None

Examples:

ds.add_column("col1", types.BoundingBox())
ds.add_column("col2", types.BoundingBox(format="ltwh"))
# Basic bounding boxes
ds.add_column("boxes", deeplake.types.BoundingBox())

# With specific format
ds.add_column("boxes", deeplake.types.BoundingBox(
    format="ltwh"  # left, top, width, height
))

deeplake.types.Point

Point(dimensions: int = 2) -> Type

Point datatype for storing points with ability to visualize them.

Parameters:

Name Type Description Default
dimensions int

The dimension of the point. For example, 2 for 2D points, 3 for 3D points, etc.: defaults to "2"

2

Examples:

ds.add_column("col1", types.Point())
ds.append([{"col1": [[1.0, 2.0], [0.0, 1.0]]}])

deeplake.types.Polygon

Polygon() -> Type

Polygon datatype for storing polygons with ability to visualize them.

Examples:

ds.add_column("col1", deeplake.types.Polygon())
poly1 = np.array([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
poly2 = np.array([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
ds.append({"col1": [[poly1, poly2], [poly1, poly2]]})
print(ds[0]["col1"])
# Output: [[[1. 2.]
#          [3. 4.]
#          [5. 6.]]

#         [[1. 2.]
#          [3. 4.]
#          [5. 6.]]]
print(ds[1]["col1"])
# Output: [[[1. 2.]
#          [3. 4.]
#          [5. 6.]]
#         [[1. 2.]
#          [3. 4.]
#          [5. 6.]]]

deeplake.types.Video

Video(compression: str = 'mp4') -> Type

Video datatype for storing videos.

Parameters:

Name Type Description Default
compression str

The compression format. Only H264 codec is supported at the moment.

'mp4'

Examples:

ds.add_column("video", types.Video(compression="mp4"))

with open("path/to/video.mp4", "rb") as f:
    bytes_data = f.read()
    ds.append([{"video": bytes_data}])

deeplake.types.Medical

Medical(compression: str) -> Type

Medical datatype for storing medical images.

Parameters:

Name Type Description Default
compression str

How to compress each row's value. Possible values: dcm, nii, nii.gz

required

Examples:

ds.add_column("col1", types.Medical(compression="dcm"))

with open("path/to/dicom/file.dcm", "rb") as f:
    bytes_data = f.read()
    ds.append([{"col1": bytes_data}])

deeplake.types.Struct

Struct(fields: dict[str, DataType | str | Type]) -> Type

Defines a custom datatype with specified keys.

See deeplake.types.Dict for a type that supports different key/value pairs per value.

Parameters:

Name Type Description Default
fields dict[str, DataType | str | Type]

A dict where the key is the name of the field, and the value is the datatype definition for it

required

Examples:

ds.add_column("col1", types.Struct({
   "field1": types.Int16(),
   "field2": "text",
}))

ds.append([{"col1": {"field1": 3, "field2": "a"}}])
print(ds[0]["col1"]["field1"]) # Output: 3
# Define fixed structure with specific types
ds.add_column("info", deeplake.types.Struct({
    "id": deeplake.types.Int64(),
    "name": "text",
    "score": deeplake.types.Float32()
}))

# Add data
ds.append([{
    "info": {
        "id": 1,
        "name": "sample",
        "score": 0.95
    }
}])

deeplake.types.Sequence

Sequence(nested_type: DataType | str | Type) -> Type

Creates a sequence type that represents an ordered list of other data types.

A sequence maintains the order of its values, making it suitable for time-series data like videos (sequences of images).

Parameters:

Name Type Description Default
nested_type DataType | str | Type

DataType | str | Type The data type of the values in the sequence. Can be any data type, not just primitive types.

required

Returns:

Name Type Description
Type Type

A new sequence data type.

Examples:

Create a sequence of images:

ds.add_column("col1", types.Sequence(types.Image(sample_compression="jpg")))

# Sequence of images (e.g., video frames)
ds.add_column("frames", deeplake.types.Sequence(
    deeplake.types.Image(sample_compression="jpeg")
))

# Sequence of embeddings
ds.add_column("token_embeddings", deeplake.types.Sequence(
    deeplake.types.Embedding(768)
))

# Add data
ds.append([{
    "frames": [frame1, frame2, frame3],  # List of images
    "token_embeddings": [emb1, emb2, emb3]  # List of embeddings
}])
Link(type: Type) -> Type

A link to an external resource. The value returned will be a reference to the external resource rather than the raw data.

Parameters:

Name Type Description Default
type Type

The type of the linked data

required

Examples:

ds.add_column("col1", types.Link(types.Image()))

Index Types

Deep Lake supports several index types for optimizing queries on different data types.

Text Index Types

deeplake.types.TextIndex

TextIndex(type: TextIndexEnumType) -> TextIndexType

Creates a text index.

Parameters:

Name Type Description Default
type TextIndexEnumType

TextIndexEnumType

required

Returns:

Name Type Description
Type TextIndexType

Text index type.

Examples:

Create text columns with different text index types:

ds.add_column("col1", types.Text)
ds.add_column("col2", types.Text)
ds["col1"].create_index(types.TextIndex(types.Inverted))
ds["col2"].create_index(types.TextIndex(types.BM25))

deeplake.types.Inverted module-attribute

Inverted: Inverted

A text index that supports keyword lookup.

This index can be used with CONTAINS(column, 'wanted_value').

deeplake.types.BM25 module-attribute

BM25: BM25

A BM25-based index of text data.

This index can be used with BM25_SIMILARITY(column, 'search text') in a TQL ORDER BY clause.

See Also

BM25 Algorithm <https://en.wikipedia.org/wiki/Okapi_BM25>_

deeplake.types.Exact module-attribute

Exact: Exact

A text index that supports whole text lookup.

This index can be used with EQUALS(column, 'wanted_value').

Numeric Index Types

deeplake.types.NumericIndex

NumericIndex(
    type: NumericIndexEnumType,
) -> NumericIndexType
# Create numeric index for efficient range queries
ds.add_column("age", deeplake.types.Int32())
ds["age"].create_index(
    deeplake.types.NumericIndex(deeplake.types.NumericIndexEnumType.Inverted)
)

# Use in queries with comparison operators
results = ds.query("SELECT * WHERE age > 25")
results = ds.query("SELECT * WHERE age BETWEEN 18 AND 65")
results = ds.query("SELECT * WHERE age IN (25, 30, 35)")