Column Classes¶

Deep Lake provides two column classes for different access levels:

Class	Description
Column	Full read-write access to column data
ColumnView	Read-only access to column data

Column Class¶

deeplake.Column ¶

Bases: ColumnView

Provides read-write access to a column in a dataset. Column extends ColumnView with methods for modifying data, making it suitable for dataset creation and updates in ML workflows.

The Column class allows you to:

Read and write data using integer indices, slices, or lists of indices
Modify data asynchronously for better performance
Access and modify column metadata
Handle various data types common in ML: images, embeddings, labels, etc.

Examples:

Update training labels:

# Update single label
ds["labels"][0] = 1

# Update batch of labels
ds["labels"][0:32] = new_labels

# Async update for better performance
future = ds["labels"].set_async(slice(0, 32), new_labels)
future.wait()

Store image embeddings:

# Generate and store embeddings
embeddings = model.encode(images)
ds["embeddings"][0:len(embeddings)] = embeddings

Manage column metadata:

# Store preprocessing parameters
ds["images"].metadata["mean"] = [0.485, 0.456, 0.406]
ds["images"].metadata["std"] = [0.229, 0.224, 0.225]

getitem ¶

__getitem__(
    index: int | slice | list | tuple,
) -> ndarray | list | Dict | str | bytes | None

Retrieve data from the column at the specified index or range.

Parameters:

Name	Type	Description	Default
`index`	`int \| slice \| list \| tuple`	Can be: int: Single item index slice: Range of indices (e.g., 0:10) list/tuple: Multiple specific indices	required

Returns:

Type	Description
`ndarray \| list \| Dict \| str \| bytes \| None`	The data at the specified index/indices. Type depends on the column's data type.

Examples:

# Get single item
image = column[0]

# Get range
batch = column[0:32]

# Get specific indices
items = column[[1, 5, 10]]

setitem ¶

__setitem__(index: int | slice, value: Any) -> None

Set data in the column at the specified index or range.

Parameters:

Name	Type	Description	Default
`index`	`int \| slice`	Can be: int: Single item index slice: Range of indices (e.g., 0:10)	required
`value`	`Any`	The data to store. Must match the column's data type.	required

Examples:

# Update single item
column[0] = new_image

# Update range
column[0:32] = new_batch

get_async ¶

get_async(index: int | slice | list | tuple) -> Future

Asynchronously retrieve data from the column. Useful for large datasets or when loading multiple items in ML pipelines.

Parameters:

Name	Type	Description	Default
`index`	`int \| slice \| list \| tuple`	Can be: int: Single item index slice: Range of indices list/tuple: Multiple specific indices	required

Returns:

Name	Type	Description
`Future`	`Future`	A Future object that resolves to the requested data.

Examples:

# Async batch load
future = column.get_async(slice(0, 32))
batch = future.result()

# Using with async/await
async def load_batch():
    batch = await column.get_async(slice(0, 32))
    return batch

metadata `property` ¶

metadata: Metadata

name `property` ¶

name: str

Get the name of the column.

Returns:

Name	Type	Description
`str`	`str`	The column name.

set_async ¶

set_async(index: int | slice, value: Any) -> FutureVoid

Asynchronously set data in the column. Useful for large updates or when modifying multiple items in ML pipelines.

Parameters:

Name	Type	Description	Default
`index`	`int \| slice`	Can be: int: Single item index slice: Range of indices	required
`value`	`Any`	The data to store. Must match the column's data type.	required

Returns:

Name	Type	Description
`FutureVoid`	`FutureVoid`	A FutureVoid that completes when the update is finished.

Examples:

# Async batch update
future = column.set_async(slice(0, 32), new_batch)
future.wait()

# Using with async/await
async def update_batch():
    await column.set_async(slice(0, 32), new_batch)

ColumnView Class¶

deeplake.ColumnView ¶

Provides read-only access to a column in a dataset. ColumnView is designed for efficient data access in ML workflows, supporting both synchronous and asynchronous operations.

The ColumnView class allows you to:

Access column data using integer indices, slices, or lists of indices
Retrieve data asynchronously for better performance in ML pipelines
Access column metadata and properties
Get information about linked data if the column contains references

Examples:

Load image data from a column for training:

# Access a single image
image = ds["images"][0]

# Load a batch of images
batch = ds["images"][0:32]

# Async load for better performance
images_future = ds["images"].get_async(slice(0, 32))
images = images_future.result()

Access embeddings for similarity search:

# Get all embeddings
embeddings = ds["embeddings"][:]

# Get specific embeddings by indices
selected = ds["embeddings"][[1, 5, 10]]

Check column properties:

# Get column name
name = ds["images"].name

# Access metadata
if "mean" in ds["images"].metadata.keys():
    mean = dataset["images"].metadata["mean"]

getitem ¶

__getitem__(
    index: int | slice | list | tuple,
) -> ndarray | list | Dict | str | bytes | None

Retrieve data from the column at the specified index or range.

Parameters:

Name	Type	Description	Default
`index`	`int \| slice \| list \| tuple`	Can be: int: Single item index slice: Range of indices (e.g., 0:10) list/tuple: Multiple specific indices	required

Returns:

Type	Description
`ndarray \| list \| Dict \| str \| bytes \| None`	The data at the specified index/indices. Type depends on the column's data type.

Examples:

# Get single item
image = column[0]

# Get range
batch = column[0:32]

# Get specific indices
items = column[[1, 5, 10]]

get_async ¶

get_async(index: int | slice | list | tuple) -> Future

Asynchronously retrieve data from the column. Useful for large datasets or when loading multiple items in ML pipelines.

Parameters:

Name	Type	Description	Default
`index`	`int \| slice \| list \| tuple`	Can be: int: Single item index slice: Range of indices list/tuple: Multiple specific indices	required

Returns:

Name	Type	Description
`Future`	`Future`	A Future object that resolves to the requested data.

Examples:

# Async batch load
future = column.get_async(slice(0, 32))
batch = future.result()

# Using with async/await
async def load_batch():
    batch = await column.get_async(slice(0, 32))
    return batch

metadata `property` ¶

metadata: ReadOnlyMetadata

Access the column's metadata. Useful for storing statistics, preprocessing parameters, or other information about the column data.

Returns:

Name	Type	Description
`ReadOnlyMetadata`	`ReadOnlyMetadata`	A ReadOnlyMetadata object for reading metadata.

Examples:

# Access preprocessing parameters
mean = column.metadata["mean"]
std = column.metadata["std"]

# Check available metadata
for key in column.metadata.keys():
    print(f"{key}: {column.metadata[key]}")

name `property` ¶

name: str

Get the name of the column.

Returns:

Name	Type	Description
`str`	`str`	The column name.

Class Comparison¶

Column¶

Provides read-write access
Can modify data
Can update metadata
Available in Dataset

# Get mutable column
ds = deeplake.open("s3://bucket/dataset")
column = ds["images"]

# Read data
image = column[0]
batch = column[0:100]

# Write data
column[0] = new_image
column[0:100] = new_batch

# Async operations
future = column.set_async(0, new_image)
future.wait()

ColumnView¶

Read-only access
Cannot modify data
Can read metadata
Available in ReadOnlyDataset and DatasetView

# Get read-only column
ds = deeplake.open_read_only("s3://bucket/dataset")
column = ds["images"]

# Read data
image = column[0]
batch = column[0:100]

# Async read
future = column.get_async(slice(0, 100))
batch = future.result()

Examples¶

Data Access¶

# Direct indexing
single_item = column[0]
batch = column[0:100]
selected = column[[1, 5, 10]]

# Async data access 
future = column.get_async(slice(0, 1000))
data = future.result()

Metadata¶

# Read metadata from any column type
name = column.name
metadata = column.metadata

# Update metadata (Column only)
column.metadata["mean"] = [0.485, 0.456, 0.406]
column.metadata["std"] = [0.229, 0.224, 0.225]

Column Classes¶

Column Class¶

deeplake.Column ¶

__getitem__ ¶

__setitem__ ¶

get_async ¶

metadata property ¶

name property ¶

set_async ¶

ColumnView Class¶

deeplake.ColumnView ¶

__getitem__ ¶

get_async ¶

metadata property ¶

name property ¶

Class Comparison¶

Column¶

ColumnView¶

Examples¶

Data Access¶

Metadata¶

getitem ¶

setitem ¶

metadata `property` ¶

name `property` ¶

getitem ¶

metadata `property` ¶

name `property` ¶