Types¶

Deep Lake provides a comprehensive type system designed for efficient data storage and retrieval. The type system includes basic numeric types as well as specialized types optimized for common data formats like images, embeddings, and text.

Each type can be specified either using the full type class or a string shorthand:

# Using type class
ds.add_column("col1", deeplake.types.Float32())

# Using string shorthand
ds.add_column("col2", "float32")

Types determine:¶

How data is stored and compressed
What operations are available
How the data can be queried and indexed
Integration with external libraries and frameworks

Numeric Types¶

All basic numeric types:

import deeplake

# Integers
ds.add_column("int8", deeplake.types.Int8())      # -128 to 127
ds.add_column("int16", deeplake.types.Int16())    # -32,768 to 32,767
ds.add_column("int32", deeplake.types.Int32())    # -2^31 to 2^31-1
ds.add_column("int64", deeplake.types.Int64())    # -2^63 to 2^63-1

# Unsigned Integers
ds.add_column("uint8", deeplake.types.UInt8())    # 0 to 255
ds.add_column("uint16", deeplake.types.UInt16())  # 0 to 65,535
ds.add_column("uint32", deeplake.types.UInt32())  # 0 to 2^32-1
ds.add_column("uint64", deeplake.types.UInt64())  # 0 to 2^64-1

# Floating Point
ds.add_column("float16", deeplake.types.Float16())  # Half precision
ds.add_column("float32", deeplake.types.Float32())  # Single precision
ds.add_column("float64", deeplake.types.Float64())  # Double precision

# Boolean
ds.add_column("is_valid", deeplake.types.Bool())     # True/False values

Basic Type Functions¶

deeplake.types.Int8 ¶

Int8(
    index_type: (
        str | IndexType | NumericIndex | None
    ) = None,
) -> DataType | Type

Creates an 8-bit integer value type.

Parameters:

Name	Type	Description	Default
`index_type`	`str \| IndexType \| NumericIndex \| None`	str \| IndexType \| NumericIndex \| None How to index the data in the column for faster searching. Options are: :class:`deeplake.types.Inverted` Default is `None` meaning "do not index"	`None`

Returns:

Type	Description
`DataType \| Type`	DataType \| Type: A new 8-bit integer data type.

Examples:

Create a column with 8-bit integer type:

ds.add_column("col", types.Int8)
ds.add_column("idx_col", deeplake.types.Int8(deeplake.types.NumericIndex(deeplake.types.Inverted)))
ds.add_column("idx_col_1", deeplake.types.Int8(deeplake.types.Inverted))
ds.add_column("idx_col_2", deeplake.types.Int8("Inverted"))

deeplake.types.Int16 ¶

Int16(
    index_type: (
        str | IndexType | NumericIndex | None
    ) = None,
) -> DataType | Type

Creates a 16-bit integer value type.

Parameters:

Name	Type	Description	Default
`index_type`	`str \| IndexType \| NumericIndex \| None`	str \| IndexType \| NumericIndex \| None How to index the data in the column for faster searching. Options are: :class:`deeplake.types.Inverted` Default is `None` meaning "do not index"	`None`

Returns:

Type	Description
`DataType \| Type`	DataType \| Type: A new 16-bit integer data type.

Examples:

Create a column with 16-bit integer type:

ds.add_column("col", types.Int16)
ds.add_column("idx_col", deeplake.types.Int16(deeplake.types.NumericIndex(deeplake.types.Inverted)))
ds.add_column("idx_col_1", deeplake.types.Int16(deeplake.types.Inverted))
ds.add_column("idx_col_2", deeplake.types.Int16("Inverted"))

deeplake.types.Int32 ¶

Int32(
    index_type: (
        str | IndexType | NumericIndex | None
    ) = None,
) -> DataType | Type

Creates a 32-bit integer value type.

Parameters:

Name	Type	Description	Default
`index_type`	`str \| IndexType \| NumericIndex \| None`	str \| IndexType \| NumericIndex \| None How to index the data in the column for faster searching. Options are: :class:`deeplake.types.Inverted` Default is `None` meaning "do not index"	`None`

Returns:

Type	Description
`DataType \| Type`	DataType \| Type: A new 32-bit integer data type.

Examples:

Create a column with 32-bit integer type:

ds.add_column("col", types.Int32)
ds.add_column("idx_col", deeplake.types.Int32(deeplake.types.NumericIndex(deeplake.types.Inverted)))
ds.add_column("idx_col_1", deeplake.types.Int32(deeplake.types.Inverted))
ds.add_column("idx_col_2", deeplake.types.Int32("Inverted"))

deeplake.types.Int64 ¶

Int64(
    index_type: (
        str | IndexType | NumericIndex | None
    ) = None,
) -> DataType | Type

Creates a 64-bit integer value type.

Parameters:

Name	Type	Description	Default
`index_type`	`str \| IndexType \| NumericIndex \| None`	str \| IndexType \| NumericIndex \| None How to index the data in the column for faster searching. Options are: :class:`deeplake.types.Inverted` Default is `None` meaning "do not index"	`None`

Returns:

Type	Description
`DataType \| Type`	DataType \| Type: A new 64-bit integer data type.

Examples:

Create a column with 64-bit integer type:

ds.add_column("col", types.Int64)
ds.add_column("idx_col", deeplake.types.Int64(deeplake.types.NumericIndex(deeplake.types.Inverted)))
ds.add_column("idx_col_1", deeplake.types.Int64(deeplake.types.Inverted))
ds.add_column("idx_col_2", deeplake.types.Int64("Inverted"))

deeplake.types.UInt8 ¶

UInt8(
    index_type: (
        str | IndexType | NumericIndex | None
    ) = None,
) -> DataType | Type

Creates an unsigned 8-bit integer value type.

Parameters:

Name	Type	Description	Default
`index_type`	`str \| IndexType \| NumericIndex \| None`	str \| IndexType \| NumericIndex \| None How to index the data in the column for faster searching. Options are: :class:`deeplake.types.Inverted` Default is `None` meaning "do not index"	`None`

Returns:

Type	Description
`DataType \| Type`	DataType \| Type: A new unsigned 8-bit integer data type.

Examples:

ds.add_column("col", types.UInt8)
ds.add_column("idx_col", deeplake.types.UInt8(deeplake.types.NumericIndex(deeplake.types.Inverted)))
ds.add_column("idx_col_1", deeplake.types.UInt8(deeplake.types.Inverted))
ds.add_column("idx_col_2", deeplake.types.UInt8("Inverted"))

deeplake.types.UInt16 ¶

UInt16(
    index_type: (
        str | IndexType | NumericIndex | None
    ) = None,
) -> DataType | Type

Creates an unsigned 16-bit integer value type.

Parameters:

Name	Type	Description	Default
`index_type`	`str \| IndexType \| NumericIndex \| None`	str \| IndexType \| NumericIndex \| None How to index the data in the column for faster searching. Options are: :class:`deeplake.types.Inverted` Default is `None` meaning "do not index"	`None`

Returns:

Type	Description
`DataType \| Type`	DataType \| Type: A new unsigned 16-bit integer data type.

Examples:

ds.add_column("col", types.UInt16)
ds.add_column("idx_col", deeplake.types.UInt16(deeplake.types.NumericIndex(deeplake.types.Inverted)))
ds.add_column("idx_col_1", deeplake.types.UInt16(deeplake.types.Inverted))
ds.add_column("idx_col_2", deeplake.types.UInt16("Inverted"))

deeplake.types.UInt32 ¶

UInt32(
    index_type: (
        str | IndexType | NumericIndex | None
    ) = None,
) -> DataType | Type

Creates an unsigned 32-bit integer value type.

Parameters:

Name	Type	Description	Default
`index_type`	`str \| IndexType \| NumericIndex \| None`	str \| IndexType \| NumericIndex \| None How to index the data in the column for faster searching. Options are: :class:`deeplake.types.Inverted` Default is `None` meaning "do not index"	`None`

Returns:

Type	Description
`DataType \| Type`	DataType \| Type: A new unsigned 32-bit integer data type.

Examples:

ds.add_column("col", types.UInt32)
ds.add_column("idx_col", deeplake.types.UInt32(deeplake.types.NumericIndex(deeplake.types.Inverted)))
ds.add_column("idx_col_1", deeplake.types.UInt32(deeplake.types.Inverted))
ds.add_column("idx_col_2", deeplake.types.UInt32("Inverted"))

deeplake.types.UInt64 ¶

UInt64(
    index_type: (
        str | IndexType | NumericIndex | None
    ) = None,
) -> DataType | Type

Creates an unsigned 64-bit integer value type.

Parameters:

Name	Type	Description	Default
`index_type`	`str \| IndexType \| NumericIndex \| None`	str \| IndexType \| NumericIndex \| None How to index the data in the column for faster searching. Options are: :class:`deeplake.types.Inverted` Default is `None` meaning "do not index"	`None`

Returns:

Type	Description
`DataType \| Type`	DataType \| Type: A new unsigned 64-bit integer data type.

Examples:

ds.add_column("col1", types.UInt64)
ds.add_column("idx_col", deeplake.types.UInt64(deeplake.types.NumericIndex(deeplake.types.Inverted)))
ds.add_column("idx_col_1", deeplake.types.UInt64(deeplake.types.Inverted))
ds.add_column("idx_col_2", deeplake.types.UInt64("Inverted"))

deeplake.types.Float16 ¶

Float16(
    index_type: (
        str | IndexType | NumericIndex | None
    ) = None,
) -> DataType | Type

Creates a 16-bit (half) float value type.

Parameters:

Name	Type	Description	Default
`index_type`	`str \| IndexType \| NumericIndex \| None`	str \| IndexType \| NumericIndex \| None How to index the data in the column for faster searching. Options are: :class:`deeplake.types.Inverted` Default is `None` meaning "do not index"	`None`

Returns:

Type	Description
`DataType \| Type`	DataType \| Type: A new 16-bit float data type.

Examples:

Create a column with 16-bit float type:

ds.add_column("col", types.Float16)
ds.add_column("idx_col", deeplake.types.Float16(deeplake.types.NumericIndex(deeplake.types.Inverted)))
ds.add_column("idx_col_1", deeplake.types.Float16(deeplake.types.Inverted))
ds.add_column("idx_col_2", deeplake.types.Float16("Inverted"))

deeplake.types.Float32 ¶

Float32(
    index_type: (
        str | IndexType | NumericIndex | None
    ) = None,
) -> DataType | Type

Creates a 32-bit float value type.

Parameters:

Name	Type	Description	Default
`index_type`	`str \| IndexType \| NumericIndex \| None`	str \| IndexType \| NumericIndex \| None How to index the data in the column for faster searching. Options are: :class:`deeplake.types.Inverted` Default is `None` meaning "do not index"	`None`

Returns:

Type	Description
`DataType \| Type`	DataType \| Type: A new 32-bit float data type.

Examples:

Create a column with 32-bit float type:

ds.add_column("col", types.Float32)
ds.add_column("idx_col", deeplake.types.Float32(deeplake.types.NumericIndex(deeplake.types.Inverted)))
ds.add_column("idx_col_1", deeplake.types.Float32(deeplake.types.Inverted))
ds.add_column("idx_col_2", deeplake.types.Float32("Inverted"))

deeplake.types.Float64 ¶

Float64(
    index_type: (
        str | IndexType | NumericIndex | None
    ) = None,
) -> DataType | Type

Creates a 64-bit float value type.

Parameters:

Name	Type	Description	Default
`index_type`	`str \| IndexType \| NumericIndex \| None`	str \| IndexType \| NumericIndex \| None How to index the data in the column for faster searching. Options are: :class:`deeplake.types.Inverted` Default is `None` meaning "do not index"	`None`

Returns:

Type	Description
`DataType \| Type`	DataType \| Type: A new 64-bit float data type.

Examples:

Create a column with 64-bit float type:

ds.add_column("col", types.Float64)
ds.add_column("idx_col", deeplake.types.Float64(deeplake.types.NumericIndex(deeplake.types.Inverted)))
ds.add_column("idx_col_1", deeplake.types.Float64(deeplake.types.Inverted))
ds.add_column("idx_col_2", deeplake.types.Float64("Inverted"))

deeplake.types.Bool ¶

Bool() -> DataType

Creates a boolean value type.

Returns:

Name	Type	Description
`DataType`	`DataType`	A new boolean data type.

Examples:

Create columns with boolean type:

ds.add_column("col1", types.Bool)
ds.add_column("col2", "bool")

deeplake.types.ClassLabel ¶

ClassLabel(dtype: DataType | str) -> Type

Stores categorical labels as numerical values with a mapping to class names.

ClassLabel is designed for classification tasks where you want to store labels as efficient numerical indices while maintaining human-readable class names. The class names are stored in the column's metadata under the key "class_names", and the actual data contains numerical indices pointing to these class names.

Parameters:

Name	Type	Description	Default
`dtype`	`DataType \| str`	DataType \| str The datatype for storing the numerical class indices. Common choices are "uint8", "uint16", "uint32" or their DataType equivalents. Choose based on the number of classes you have.	required

How it works

Define a column with ClassLabel type
Set the "class_names" in the column's metadata as a list of strings
Store numerical indices (0, 1, 2, ...) that map to the class names
When reading, you can use the metadata to convert indices back to class names

Examples:

Basic usage with class labels:

# Create a column for object categories
ds.add_column("categories", types.ClassLabel(types.Array("uint32", 1)))

# Define the class names in metadata
ds["categories"].metadata["class_names"] = ["person", "car", "dog", "cat"]

# Store numerical indices corresponding to class names
# 0 = "person", 1 = "car", 2 = "dog", 3 = "cat"
ds.append({
    "categories": [np.array([0, 1], dtype="uint32")]  # person and car
})
ds.append({
    "categories": [np.array([2, 3], dtype="uint32")]  # dog and cat
})

# Access the numerical values
print(ds[0]["categories"])  # Output: [0 1]

# Get the class names from metadata
class_names = ds["categories"].metadata["class_names"]
indices = ds[0]["categories"]
labels = [class_names[i] for i in indices]
print(labels)  # Output: ['person', 'car']

Advanced usage from COCO ingestion pattern:

# This example shows the pattern used in COCO dataset ingestion
# where you have multiple annotation groups

# Create dataset
ds = deeplake.create("tmp://")

# Add category columns with ClassLabel type
ds.add_column("categories", types.ClassLabel(types.Array("uint32", 1)))
ds.add_column("super_categories", types.ClassLabel(types.Array("uint32", 1)))

# Set class names from COCO categories
ds["categories"].metadata["class_names"] = [
    "person", "bicycle", "car", "motorcycle", "airplane"
]
ds["super_categories"].metadata["class_names"] = [
    "person", "vehicle", "animal"
]

# Ingest data with numerical indices
# Categories: [0, 2, 1] maps to ["person", "car", "bicycle"]
# Super categories: [0, 1, 1] maps to ["person", "vehicle", "vehicle"]
ds.append({
    "categories": [np.array([0, 2, 1], dtype="uint32")],
    "super_categories": [np.array([0, 1, 1], dtype="uint32")]
})

Using different data types for different numbers of classes:

# For datasets with fewer than 256 classes, use uint8
ds.add_column("small_set", types.ClassLabel(types.Array("uint8", 1)))
ds["small_set"].metadata["class_names"] = ["class_a", "class_b"]

# For datasets with more classes, use uint16 or uint32
ds.add_column("large_set", types.ClassLabel(types.Array("uint32", 1)))
ds["large_set"].metadata["class_names"] = [f"class_{i}" for i in range(1000)]

Numeric Indexing¶

Numeric columns support indexing for efficient comparison operations:

# Create numeric column with inverted index for range queries
ds.add_column("timestamp", deeplake.types.UInt64())

# Create the index manually
ds["timestamp"].create_index(
    deeplake.types.NumericIndex(deeplake.types.Inverted)
)

# Now you can use efficient comparison operations in queries:
# - Greater than: WHERE timestamp > 1609459200
# - Less than: WHERE timestamp < 1640995200  
# - Between: WHERE timestamp BETWEEN 1609459200 AND 1640995200
# - Value list: WHERE timestamp IN (1609459200, 1640995200)

deeplake.types.Audio ¶

Audio(
    dtype: DataType | str = "uint8",
    sample_compression: str = "mp3",
) -> Type

Creates an audio data type.

Parameters:

Name	Type	Description	Default
`dtype`	`DataType \| str`	DataType \| str The datatype of the audio samples. Defaults to "uint8".	`'uint8'`
`sample_compression`	`str`	str The compression format for the audio samples wav or mp3. Defaults to "mp3".	`'mp3'`

Returns:

Name	Type	Description
`Type`	`Type`	A new audio data type.

Examples:

Create an audio column with default settings:

ds.add_column("col1", types.Audio())

Create an audio column with specific sample compression:

ds.add_column("col2", types.Audio(sample_compression="wav"))

# Basic audio storage
ds.add_column("audio", deeplake.types.Audio())

# WAV format
ds.add_column("audio", deeplake.types.Audio(
    sample_compression="wav"
))

# MP3 compression (default)
ds.add_column("audio", deeplake.types.Audio(
    sample_compression="mp3"
))

# With specific dtype
ds.add_column("audio", deeplake.types.Audio(
    dtype="uint8",
    sample_compression="wav"
))

# Audio with Link for external references
ds.add_column("audio_links", deeplake.types.Link(
    deeplake.types.Audio(sample_compression="mp3")
))

deeplake.types.Image ¶

Image(
    dtype: DataType | str = "uint8",
    sample_compression: str = "png",
) -> Type

An image of a given format. The value returned will be a multidimensional array of values rather than the raw image bytes.

Available sample_compressions:

png (default)
jpg / jpeg

Parameters:

Name	Type	Description	Default
`dtype`	`DataType \| str`	The data type of the array elements to return	`'uint8'`
`sample_compression`	`str`	The on-disk compression/format of the image	`'png'`

Examples:

ds.add_column("col1", types.Image)
ds.add_column("col2", types.Image(sample_compression="jpg"))

# Basic image storage
ds.add_column("images", deeplake.types.Image())

# JPEG compression
ds.add_column("images", deeplake.types.Image(
    sample_compression="jpeg"
))

# With specific dtype
ds.add_column("images", deeplake.types.Image(
    dtype="uint8"  # 8-bit RGB
))

deeplake.types.Embedding ¶

Embedding(
    size: int | None = None,
    dtype: DataType | str = "float32",
    index_type: (
        EmbeddingIndexType | QuantizationType | None
    ) = None,
) -> Type

Creates a single-dimensional embedding of a given length.

Parameters:

Name	Type	Description	Default
`size`	`int \| None`	int \| None The size of the embedding	`None`
`dtype`	`DataType \| str`	DataType \| str The datatype of the embedding. Defaults to float32	`'float32'`
`index_type`	`EmbeddingIndexType \| QuantizationType \| None`	EmbeddingIndexType \| QuantizationType \| None How to compress the embeddings in the index. Default uses no compression, but can be set to :class:`deeplake.types.QuantizationType.Binary`	`None`

Returns:

Name	Type	Description
`Type`	`Type`	A new embedding data type.

deeplake.types.Text ¶

Text(
    index_type: str | IndexType | TextIndex | None = None,
    chunk_compression: str | None = "lz4",
) -> Type

Creates a text data type of arbitrary length.

Parameters:

Name	Type	Description	Default
`index_type`	`str \| IndexType \| TextIndex \| None`	str \| IndexType \| TextIndex \| None How to index the data in the column for faster searching. Options are: :class:`deeplake.types.Inverted` :class:`deeplake.types.BM25` :class:`deeplake.types.Exact` Default is `None` meaning "do not index"	`None`
`chunk_compression`	`str \| None`	str \| None defines the compression algorithm for on-disk storage of text data. supported values are 'lz4', 'zstd', and 'null' (no compression). Default is `lz4`	`'lz4'`

Returns:

Name	Type	Description
`Type`	`Type`	A new text data type.

Examples:

Create text columns with different configurations:

ds.add_column("col1", types.Text)
ds.add_column("col2", "text")
ds.add_column("col3", str)
ds.add_column("col4", types.Text(index_type=types.Inverted))
ds.add_column("col5", types.Text(index_type=types.BM25))

# Basic text
ds.add_column("text", deeplake.types.Text())

# Text with BM25 index for semantic search
ds.add_column("text2", deeplake.types.Text(
    index_type=deeplake.types.BM25
))

# Text with inverted index for keyword search
ds.add_column("text3", deeplake.types.Text(
    index_type=deeplake.types.Inverted
))

# Text with exact index for whole text matching
ds.add_column("text4", deeplake.types.Text(
    index_type=deeplake.types.Exact
))

deeplake.types.Dict ¶

Dict(
    index_type: str | IndexType | JsonIndex | None = None,
) -> Type

Creates a type that supports storing arbitrary key/value pairs in each row.

Parameters:

Name	Type	Description	Default
`index_type`	`str \| IndexType \| JsonIndex \| None`	str \| IndexType \| JsonIndex \| None How to index the data in the column for faster searching. Options are: :class:`deeplake.types.Inverted` Default is `None` meaning "do not index"	`None`

Returns:

Name	Type	Description
`Type`	`Type`	A new dictionary data type.

deeplake.types.Array ¶

Array(dtype: DataType | str, dimensions: int) -> DataType

Array(dtype: DataType | str, shape: list[int]) -> DataType

Array(dtype: DataType | str) -> DataType

Array(
    dtype: DataType | str,
    dimensions: int | None,
    shape: list[int] | None,
) -> DataType

Creates a generic array of data.

Parameters:

Name	Type	Description	Default
`dtype`	`DataType \| str`	DataType \| str The datatype of values in the array	required
`dimensions`	`int \| None`	int \| None The number of dimensions/axes in the array. Unlike specifying `shape`, there is no constraint on the size of each dimension.	required
`shape`	`list[int] \| None`	list[int] \| None Constrain the size of each dimension in the array	required

Returns:

Name	Type	Description
`DataType`	`DataType`	A new array data type with the specified parameters.

Examples:

Create a three-dimensional array, where each dimension can have any number of elements:

ds.add_column("col1", types.Array("int32", dimensions=3))

Create a three-dimensional array, where each dimension has a known size:

ds.add_column("col2", types.Array(types.Float32(), shape=[50, 30, 768]))

# Fixed-size array
ds.add_column("features", deeplake.types.Array(
    "float32",
    shape=[512]  # Enforces size
))

# Variable-size array
ds.add_column("sequences", deeplake.types.Array(
    "int32",
    dimensions=1  # Allows any size
))

Numeric Indexes¶

Deep Lake supports indexing numeric columns for faster lookup operations:

from deeplake.types import NumericIndex, Inverted
# Add numeric column and create an inverted index
ds.add_column("scores", "float32")
ds["scores"].create_index(NumericIndex(Inverted))

# Use with TQL for efficient filtering
results = ds.query("SELECT * WHERE CONTAINS(scores, 0.95)")

deeplake.types.Bytes ¶

Bytes() -> DataType

Creates a byte array value type. This is useful for storing raw binary data.

Returns:

Name	Type	Description
`DataType`	`DataType`	A new byte array data type.

Examples:

Create columns with byte array type:

ds.add_column("col1", types.Bytes)
ds.add_column("col2", "bytes")

Append raw binary data to a byte array column:

ds.append([{"col1": b"hello", "col2": b"world"}])

deeplake.types.BinaryMask ¶

BinaryMask(
    sample_compression: str | None = None,
    chunk_compression: str | None = None,
) -> Type

In binary mask, pixel value is a boolean for whether there is/is-not an object of a class present.

NOTE: Since binary masks often contain large amounts of data, it is recommended to compress them using lz4.

Parameters:

Name	Type	Description	Default
`sample_compression`	`str \| None`	How to compress each row's value. supported values are `lz4`, `zstd`, and `null` (no compression).	`None`
`chunk_compression`	`str \| None`	Defines the compression algorithm for on-disk storage of mask data. supported values are `lz4`, `zstd`, and `null` (no compression).	`None`

Examples:

ds.add_column("col1", types.BinaryMask(sample_compression="lz4"))
ds.append([{"col1": np.zeros((512, 512, 5), dtype="bool")}])

# Basic binary mask
ds.add_column("masks", deeplake.types.BinaryMask())

# With compression
ds.add_column("masks", deeplake.types.BinaryMask(
    sample_compression="lz4"
))

deeplake.types.SegmentMask ¶

SegmentMask(
    dtype: DataType | str = "uint8",
    sample_compression: str | None = None,
    chunk_compression: str | None = None,
) -> Type

Segmentation masks are 2D representations of class labels where a numerical class value is encoded in an array of same shape as the image.

NOTE: Since segmentation masks often contain large amounts of data, it is recommended to compress them using lz4.

Parameters:

Name	Type	Description	Default
`sample_compression`	`str \| None`	How to compress each row's value. supported values are `lz4`, `zstd`, and `null` (no compression).	`None`
`chunk_compression`	`str \| None`	Defines the compression algorithm for on-disk storage of mask data. supported values are `lz4`, `zstd`, `png`, `nii`, `nii.gz`, and `null` (no compression).	`None`

Examples:

ds.add_column("col1", types.SegmentMask(sample_compression="lz4"))
ds.append([{"col1": np.zeros((512, 512, 3))}])

# Basic segmentation mask
ds.add_column("segmentation", deeplake.types.SegmentMask())

# With compression
ds.add_column("segmentation", deeplake.types.SegmentMask(
    dtype="uint8",
    sample_compression="lz4"
))

deeplake.types.BoundingBox ¶

BoundingBox(
    dtype: DataType | str = "float32",
    format: str | None = None,
    bbox_type: str | None = None,
) -> Type

Stores an array of values specifying the bounding boxes of an image.

Parameters:

Name	Type	Description	Default
`dtype`	`DataType \| str`	The datatype of values (default float32)	`'float32'`
`format`	`str \| None`	The bounding box format. Possible values: `ccwh`, `ltwh`, `ltrb`, `unknown`	`None`
`bbox_type`	`str \| None`	The pixel type. Possible values: `pixel`, `fractional`	`None`

Examples:

ds.add_column("col1", types.BoundingBox())
ds.add_column("col2", types.BoundingBox(format="ltwh"))

# Basic bounding boxes
ds.add_column("boxes", deeplake.types.BoundingBox())

# With specific format
ds.add_column("boxes", deeplake.types.BoundingBox(
    format="ltwh"  # left, top, width, height
))

deeplake.types.Point ¶

Point(dimensions: int = 2) -> Type

Point datatype for storing points with ability to visualize them.

Parameters:

Name	Type	Description	Default
`dimensions`	`int`	The dimension of the point. For example, 2 for 2D points, 3 for 3D points, etc.: defaults to "2"	`2`

Examples:

ds.add_column("col1", types.Point())
ds.append([{"col1": [[1.0, 2.0], [0.0, 1.0]]}])

deeplake.types.Polygon ¶

Polygon() -> Type

Polygon datatype for storing polygons with ability to visualize them.

Examples:

ds.add_column("col1", deeplake.types.Polygon())
poly1 = np.array([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
poly2 = np.array([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]])
ds.append({"col1": [[poly1, poly2], [poly1, poly2]]})
print(ds[0]["col1"])
# Output: [[[1. 2.]
#          [3. 4.]
#          [5. 6.]]

#         [[1. 2.]
#          [3. 4.]
#          [5. 6.]]]
print(ds[1]["col1"])
# Output: [[[1. 2.]
#          [3. 4.]
#          [5. 6.]]
#         [[1. 2.]
#          [3. 4.]
#          [5. 6.]]]

deeplake.types.Video ¶

Video(compression: str = 'mp4') -> Type

Video datatype for storing videos.

Parameters:

Name	Type	Description	Default
`compression`	`str`	The compression format. Only H264 codec is supported at the moment.	`'mp4'`

Examples:

ds.add_column("video", types.Video(compression="mp4"))

with open("path/to/video.mp4", "rb") as f:
    bytes_data = f.read()
    ds.append([{"video": bytes_data}])

deeplake.types.Medical ¶

Medical(compression: str) -> Type

Medical datatype for storing medical images.

Available compressions:

nii
nii.gz
dcm

Parameters:

Name	Type	Description	Default
`compression`	`str`	How to compress each row's value. Possible values: `dcm`, `nii`, `nii.gz`	required

Examples:

ds.add_column("col1", types.Medical(compression="dcm"))

with open("path/to/dicom/file.dcm", "rb") as f:
    bytes_data = f.read()
    ds.append([{"col1": bytes_data}])

deeplake.types.Mesh ¶

Mesh() -> Type

Mesh datatype for storing 3D meshes.

Available compressions:

ply
stl

Examples:

ds.add_column("col1", types.Mesh())
with open("path/to/mesh/file.stl", "rb") as f:
    bytes_data = f.read()
    ds.append([{"col1": bytes_data}])

deeplake.types.Struct ¶

Struct(fields: dict[str, DataType | str | Type]) -> Type

Defines a custom datatype with specified keys.

See deeplake.types.Dict for a type that supports different key/value pairs per value.

Parameters:

Name	Type	Description	Default
`fields`	`dict[str, DataType \| str \| Type]`	A dict where the key is the name of the field, and the value is the datatype definition for it	required

Examples:

ds.add_column("col1", types.Struct({
   "field1": types.Int16(),
   "field2": "text",
}))

ds.append([{"col1": {"field1": 3, "field2": "a"}}])
print(ds[0]["col1"]["field1"]) # Output: 3

# Define fixed structure with specific types
ds.add_column("info", deeplake.types.Struct({
    "id": deeplake.types.Int64(),
    "name": "text",
    "score": deeplake.types.Float32()
}))

# Add data
ds.append([{
    "info": {
        "id": 1,
        "name": "sample",
        "score": 0.95
    }
}])

deeplake.types.Sequence ¶

Sequence(nested_type: DataType | str | Type) -> Type

Creates a sequence type that represents an ordered list of other data types.

A sequence maintains the order of its values, making it suitable for time-series data like videos (sequences of images).

Parameters:

Name	Type	Description	Default
`nested_type`	`DataType \| str \| Type`	DataType \| str \| Type The data type of the values in the sequence. Can be any data type, not just primitive types.	required

Returns:

Name	Type	Description
`Type`	`Type`	A new sequence data type.

Examples:

Create a sequence of images:

ds.add_column("col1", types.Sequence(types.Image(sample_compression="jpg")))

# Sequence of images (e.g., video frames)
ds.add_column("frames", deeplake.types.Sequence(
    deeplake.types.Image(sample_compression="jpeg")
))

# Sequence of embeddings
ds.add_column("token_embeddings", deeplake.types.Sequence(
    deeplake.types.Embedding(768)
))

# Add data
ds.append([{
    "frames": [frame1, frame2, frame3],  # List of images
    "token_embeddings": [emb1, emb2, emb3]  # List of embeddings
}])

deeplake.types.Link ¶

Link(type: DataType | Type) -> Type

A link to an external resource. The value returned will be a reference to the external resource rather than the raw data.

Link only supports the Bytes DataType and the Image, SegmentMask, Medical, and Audio Types.

Parameters:

Name	Type	Description	Default
`type`	`DataType \| Type`	The type of the linked data. Must be the Bytes DataType or one of the following Types: Image, SegmentMask, Medical, or Audio.	required

Examples:

ds.add_column("col1", types.Link(types.Image()))

Index Types¶

Deep Lake supports several index types for optimizing queries on different data types.

IndexType Enum¶

deeplake.types.IndexType ¶

Enumeration of available text/numeric/JSON/embeddings/embeddings matrix indexing types.

Attributes:

Name	Type	Description
`Inverted`	`IndexType`	An index that supports keyword lookup. Can be used with `CONTAINS(column, 'wanted_value')`.
`BM25`	`IndexType`	A BM25-based index of text data. Can be used with `BM25_SIMILARITY(column, 'search text')` in a TQL `ORDER BY` clause.
`Exact`	`IndexType`	An exact match index for text data.
`PooledQuantized`	`IndexType`	A pooled quantized index for 2D embeddings matrices. Can be used with `MAXSIM(column, query_embeddings)` for ColBERT-style maximum similarity search.
`Clustered`	`IndexType`	Clusters embeddings in the index to speed up search. This is the default index type for embeddings.
`ClusteredQuantized`	`IndexType`	Stores a binary quantized representation of the original embedding in the index rather than a full copy of the embedding. This slightly decreases accuracy of searches, while significantly improving query time.

BM25 `class-attribute` ¶

BM25: IndexType

Clustered `class-attribute` ¶

Clustered: IndexType

ClusteredQuantized `class-attribute` ¶

ClusteredQuantized: IndexType

Exact `class-attribute` ¶

Exact: IndexType

Inverted `class-attribute` ¶

Inverted: IndexType

PooledQuantized `class-attribute` ¶

PooledQuantized: IndexType

hash ¶

__hash__() -> int

index ¶

__index__() -> int

init ¶

__init__(value: int) -> None

int ¶

__int__() -> int

members `class-attribute` ¶

__members__: dict[str, IndexType]

name `property` ¶

name: str

value `property` ¶

value: int

Returns:

Name	Type	Description
`int`	`int`	The integer value of the text index type.

Text Index Types¶

deeplake.types.TextIndex ¶

Represents a text column index type.

Used to create indexes on text columns for faster query performance. Supports inverted indexing (CONTAINS), BM25 similarity search, and exact matching.

hash `class-attribute` ¶

__hash__: None = None

init ¶

__init__(type: IndexType) -> None

deeplake.types.Inverted `module-attribute` ¶

Inverted: Inverted

A text index that supports keyword lookup.

This index can be used with CONTAINS(column, 'wanted_value').

deeplake.types.BM25 `module-attribute` ¶

BM25: BM25

A BM25-based index of text data.

This index can be used with BM25_SIMILARITY(column, 'search text') in a TQL ORDER BY clause.

deeplake.types.Exact `module-attribute` ¶

Exact: Exact

A text index that supports whole text lookup.

This index can be used with EQUALS(column, 'wanted_value').

Numeric Index Types¶

deeplake.types.NumericIndex ¶

Represents a numeric column index type.

Used to create indexes on numeric columns for faster query performance. Supports inverted indexing for CONTAINS operations.

hash `class-attribute` ¶

__hash__: None = None

init ¶

__init__(type: IndexType) -> None

JSON Index Types¶

deeplake.types.JsonIndex ¶

Represents a Dict column index type.

Used to create indexes on Dict columns for faster query performance. Supports inverted indexing for CONTAINS operations on JSON fields.

hash `class-attribute` ¶

__hash__: None = None

init ¶

__init__(type: IndexType) -> None

Embedding Index Types¶

deeplake.types.EmbeddingIndexType ¶

Represents embedding index type.

init ¶

__init__(type: IndexType) -> None

__init__(quantization: QuantizationType) -> None

__init__(type: IndexType | QuantizationType) -> None

deeplake.types.EmbeddingIndex ¶

EmbeddingIndex(
    type: IndexType | QuantizationType | None = None,
) -> EmbeddingIndexType

Creates an embedding index.

Parameters:

Name	Type	Description	Default
`type`	`IndexType \| QuantizationType \| None`	IndexType \| QuantizationType \| None = None The index type for embeddings. Can be: :class:`deeplake.types.IndexType.Clustered` - Default clustered index :class:`deeplake.types.IndexType.ClusteredQuantized` - Quantized clustered index :class:`deeplake.types.QuantizationType.Binary` - Binary quantization (maps to ClusteredQuantized)	`None`

Returns:

Name	Type	Description
`Type`	`EmbeddingIndexType`	EmbeddingIndexType.

Examples:

Create embedding columns with different index types:

# Using IndexType enum
ds.add_column("col1", types.Embedding(768, index_type=types.EmbeddingIndex(types.IndexType.ClusteredQuantized)))

# Using QuantizationType for backward compatibility
ds.add_column("col2", types.Embedding(768, index_type=types.EmbeddingIndex(types.QuantizationType.Binary)))

deeplake.types.EmbeddingsMatrixIndexType ¶

Represents a 2D embeddings matrix index type.

Used for ColBERT-style maximum similarity search on 2D embedding matrices. Supports pooled quantized indexing for efficient MAXSIM queries.

init ¶

__init__() -> None

deeplake.types.EmbeddingsMatrixIndex ¶

EmbeddingsMatrixIndex() -> EmbeddingsMatrixIndexType

Creates an embeddings matrix index.

Generic Index Wrapper¶

deeplake.types.Index ¶

Represents all available index types in the deeplake. This is a polymorphic wrapper that can hold any specific index type.

hash `class-attribute` ¶

__hash__: None = None

init ¶

__init__(
    index_type: (
        TextIndex
        | EmbeddingIndexType
        | EmbeddingsMatrixIndexType
        | JsonIndex
        | NumericIndex
    ),
) -> None

# Create numeric index for efficient range queries
ds.add_column("age", deeplake.types.Int32())
ds["age"].create_index(
    deeplake.types.NumericIndex(deeplake.types.Inverted)
)

# Use in queries with comparison operators
results = ds.query("SELECT * WHERE age > 25")
results = ds.query("SELECT * WHERE age BETWEEN 18 AND 65")
results = ds.query("SELECT * WHERE age IN (25, 30, 35)")

Types¶

Types determine:¶

Numeric Types¶

Basic Type Functions¶

deeplake.types.Int8 ¶

deeplake.types.Int16 ¶

deeplake.types.Int32 ¶

deeplake.types.Int64 ¶

deeplake.types.UInt8 ¶

deeplake.types.UInt16 ¶

deeplake.types.UInt32 ¶

deeplake.types.UInt64 ¶

deeplake.types.Float16 ¶

deeplake.types.Float32 ¶

deeplake.types.Float64 ¶

deeplake.types.Bool ¶

deeplake.types.ClassLabel ¶

Numeric Indexing¶

deeplake.types.Audio ¶

deeplake.types.Image ¶

deeplake.types.Embedding ¶

deeplake.types.Text ¶

deeplake.types.Dict ¶

deeplake.types.Array ¶

Numeric Indexes¶

deeplake.types.Bytes ¶

deeplake.types.BinaryMask ¶

deeplake.types.SegmentMask ¶

deeplake.types.BoundingBox ¶

deeplake.types.Point ¶

deeplake.types.Polygon ¶

deeplake.types.Video ¶

deeplake.types.Medical ¶

deeplake.types.Mesh ¶

deeplake.types.Struct ¶

deeplake.types.Sequence ¶

deeplake.types.Link ¶

Index Types¶

IndexType Enum¶

deeplake.types.IndexType ¶

BM25 class-attribute ¶

Clustered class-attribute ¶

ClusteredQuantized class-attribute ¶

Exact class-attribute ¶

Inverted class-attribute ¶

PooledQuantized class-attribute ¶

__hash__ ¶

__index__ ¶

__init__ ¶

__int__ ¶

__members__ class-attribute ¶

name property ¶

value property ¶

Text Index Types¶

deeplake.types.TextIndex ¶

__hash__ class-attribute ¶

__init__ ¶

deeplake.types.Inverted module-attribute ¶

deeplake.types.BM25 module-attribute ¶

deeplake.types.Exact module-attribute ¶

Numeric Index Types¶

deeplake.types.NumericIndex ¶

__hash__ class-attribute ¶

__init__ ¶

JSON Index Types¶

deeplake.types.JsonIndex ¶

__hash__ class-attribute ¶

__init__ ¶

Embedding Index Types¶

deeplake.types.EmbeddingIndexType ¶

__init__ ¶

deeplake.types.EmbeddingIndex ¶

deeplake.types.EmbeddingsMatrixIndexType ¶

__init__ ¶

deeplake.types.EmbeddingsMatrixIndex ¶

Generic Index Wrapper¶

deeplake.types.Index ¶

__hash__ class-attribute ¶

__init__ ¶

BM25 `class-attribute` ¶

Clustered `class-attribute` ¶

ClusteredQuantized `class-attribute` ¶

Exact `class-attribute` ¶

Inverted `class-attribute` ¶

PooledQuantized `class-attribute` ¶

hash ¶

index ¶

init ¶

int ¶

members `class-attribute` ¶

name `property` ¶

value `property` ¶

hash `class-attribute` ¶

init ¶

deeplake.types.Inverted `module-attribute` ¶

deeplake.types.BM25 `module-attribute` ¶

deeplake.types.Exact `module-attribute` ¶

hash `class-attribute` ¶

init ¶

hash `class-attribute` ¶

init ¶

init ¶

init ¶

hash `class-attribute` ¶

init ¶