Skip to content

Column Classes

Deep Lake provides two column classes for different access levels:

Class Description
Column Full read-write access to column data
ColumnView Read-only access to column data

Column Class

deeplake.Column

Bases: ColumnView

Provides read-write access to a column in a dataset. Column extends ColumnView with methods for modifying data, making it suitable for dataset creation and updates in ML workflows.

The Column class allows you to:

  • Read and write data using integer indices, slices, or lists of indices
  • Modify data asynchronously for better performance
  • Access and modify column metadata
  • Handle various data types common in ML: images, embeddings, labels, etc.

Examples:

Update training labels:

# Update single label
ds["labels"][0] = 1

# Update batch of labels
ds["labels"][0:32] = new_labels

# Async update for better performance
future = ds["labels"].set_async(slice(0, 32), new_labels)
future.wait()

Store image embeddings:

# Generate and store embeddings
embeddings = model.encode(images)
ds["embeddings"][0:len(embeddings)] = embeddings

Manage column metadata:

# Store preprocessing parameters
ds["images"].metadata["mean"] = [0.485, 0.456, 0.406]
ds["images"].metadata["std"] = [0.229, 0.224, 0.225]

__getitem__

__getitem__(index: int | slice | list | tuple) -> Any

Retrieve data from the column at the specified index or range.

Parameters:

Name Type Description Default
index int | slice | list | tuple

Can be:

  • int: Single item index
  • slice: Range of indices (e.g., 0:10)
  • list/tuple: Multiple specific indices
required

Returns:

Type Description
Any

The data at the specified index/indices. Type depends on the column's data type.

Examples:

# Get single item
image = column[0]

# Get range
batch = column[0:32]

# Get specific indices
items = column[[1, 5, 10]]

__setitem__

__setitem__(index: int | slice, value: Any) -> None

Set data in the column at the specified index or range.

Parameters:

Name Type Description Default
index int | slice

Can be:

  • int: Single item index
  • slice: Range of indices (e.g., 0:10)
required
value Any

The data to store. Must match the column's data type.

required

Examples:

# Update single item
column[0] = new_image

# Update range
column[0:32] = new_batch

get_async

get_async(index: int | slice | list | tuple) -> Future

Asynchronously retrieve data from the column. Useful for large datasets or when loading multiple items in ML pipelines.

Parameters:

Name Type Description Default
index int | slice | list | tuple

Can be:

  • int: Single item index
  • slice: Range of indices
  • list/tuple: Multiple specific indices
required

Returns:

Name Type Description
Future Future

A Future object that resolves to the requested data.

Examples:

# Async batch load
future = column.get_async(slice(0, 32))
batch = future.result()

# Using with async/await
async def load_batch():
    batch = await column.get_async(slice(0, 32))
    return batch

set_async

set_async(index: int | slice, value: Any) -> FutureVoid

Asynchronously set data in the column. Useful for large updates or when modifying multiple items in ML pipelines.

Parameters:

Name Type Description Default
index int | slice

Can be:

  • int: Single item index
  • slice: Range of indices
required
value Any

The data to store. Must match the column's data type.

required

Returns:

Name Type Description
FutureVoid FutureVoid

A FutureVoid that completes when the update is finished.

Examples:

# Async batch update
future = column.set_async(slice(0, 32), new_batch)
future.wait()

# Using with async/await
async def update_batch():
    await column.set_async(slice(0, 32), new_batch)

metadata property

metadata: Metadata

name property

name: str

Get the name of the column.

Returns:

Name Type Description
str str

The column name.

ColumnView Class

deeplake.ColumnView

Provides read-only access to a column in a dataset. ColumnView is designed for efficient data access in ML workflows, supporting both synchronous and asynchronous operations.

The ColumnView class allows you to:

  • Access column data using integer indices, slices, or lists of indices
  • Retrieve data asynchronously for better performance in ML pipelines
  • Access column metadata and properties
  • Get information about linked data if the column contains references

Examples:

Load image data from a column for training:

# Access a single image
image = ds["images"][0]

# Load a batch of images
batch = ds["images"][0:32]

# Async load for better performance
images_future = ds["images"].get_async(slice(0, 32))
images = images_future.result()

Access embeddings for similarity search:

# Get all embeddings
embeddings = ds["embeddings"][:]

# Get specific embeddings by indices
selected = ds["embeddings"][[1, 5, 10]]

Check column properties:

# Get column name
name = ds["images"].name

# Access metadata
if "mean" in ds["images"].metadata.keys():
    mean = dataset["images"].metadata["mean"]

__getitem__

__getitem__(index: int | slice | list | tuple) -> Any

Retrieve data from the column at the specified index or range.

Parameters:

Name Type Description Default
index int | slice | list | tuple

Can be:

  • int: Single item index
  • slice: Range of indices (e.g., 0:10)
  • list/tuple: Multiple specific indices
required

Returns:

Type Description
Any

The data at the specified index/indices. Type depends on the column's data type.

Examples:

# Get single item
image = column[0]

# Get range
batch = column[0:32]

# Get specific indices
items = column[[1, 5, 10]]

get_async

get_async(index: int | slice | list | tuple) -> Future

Asynchronously retrieve data from the column. Useful for large datasets or when loading multiple items in ML pipelines.

Parameters:

Name Type Description Default
index int | slice | list | tuple

Can be:

  • int: Single item index
  • slice: Range of indices
  • list/tuple: Multiple specific indices
required

Returns:

Name Type Description
Future Future

A Future object that resolves to the requested data.

Examples:

# Async batch load
future = column.get_async(slice(0, 32))
batch = future.result()

# Using with async/await
async def load_batch():
    batch = await column.get_async(slice(0, 32))
    return batch

metadata property

metadata: ReadOnlyMetadata

Access the column's metadata. Useful for storing statistics, preprocessing parameters, or other information about the column data.

Returns:

Name Type Description
ReadOnlyMetadata ReadOnlyMetadata

A ReadOnlyMetadata object for reading metadata.

Examples:

# Access preprocessing parameters
mean = column.metadata["mean"]
std = column.metadata["std"]

# Check available metadata
for key in column.metadata.keys():
    print(f"{key}: {column.metadata[key]}")

name property

name: str

Get the name of the column.

Returns:

Name Type Description
str str

The column name.

Class Comparison

Column

  • Provides read-write access
  • Can modify data
  • Can update metadata
  • Available in Dataset
# Get mutable column
ds = deeplake.open("s3://bucket/dataset")
column = ds["images"]

# Read data
image = column[0]
batch = column[0:100]

# Write data
column[0] = new_image
column[0:100] = new_batch

# Async operations
future = column.set_async(0, new_image)
future.wait()

ColumnView

  • Read-only access
  • Cannot modify data
  • Can read metadata
  • Available in ReadOnlyDataset and DatasetView
# Get read-only column
ds = deeplake.open_read_only("s3://bucket/dataset")
column = ds["images"]

# Read data
image = column[0]
batch = column[0:100]

# Async read
future = column.get_async(slice(0, 100))
batch = future.result()

Examples

Data Access

# Direct indexing
single_item = column[0]
batch = column[0:100]
selected = column[[1, 5, 10]]

# Async data access 
future = column.get_async(slice(0, 1000))
data = future.result()

Metadata

# Read metadata from any column type
name = column.name
metadata = column.metadata

# Update metadata (Column only)
column.metadata["mean"] = [0.485, 0.456, 0.406]
column.metadata["std"] = [0.229, 0.224, 0.225]