Column Classes¶
Deep Lake provides two column classes for different access levels:
Class | Description |
---|---|
Column | Full read-write access to column data |
ColumnView | Read-only access to column data |
Column Class¶
deeplake.Column
¶
Bases: ColumnView
Provides read-write access to a column in a dataset. Column extends ColumnView with methods for modifying data, making it suitable for dataset creation and updates in ML workflows.
The Column class allows you to:
- Read and write data using integer indices, slices, or lists of indices
- Modify data asynchronously for better performance
- Access and modify column metadata
- Handle various data types common in ML: images, embeddings, labels, etc.
Examples:
Update training labels:
# Update single label
ds["labels"][0] = 1
# Update batch of labels
ds["labels"][0:32] = new_labels
# Async update for better performance
future = ds["labels"].set_async(slice(0, 32), new_labels)
future.wait()
Store image embeddings:
# Generate and store embeddings
embeddings = model.encode(images)
ds["embeddings"][0:len(embeddings)] = embeddings
Manage column metadata:
# Store preprocessing parameters
ds["images"].metadata["mean"] = [0.485, 0.456, 0.406]
ds["images"].metadata["std"] = [0.229, 0.224, 0.225]
__getitem__
¶
Retrieve data from the column at the specified index or range.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
index
|
int | slice | list | tuple
|
Can be:
|
required |
Returns:
Type | Description |
---|---|
Any
|
The data at the specified index/indices. Type depends on the column's data type. |
Examples:
__setitem__
¶
Set data in the column at the specified index or range.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
index
|
int | slice
|
Can be:
|
required |
value
|
Any
|
The data to store. Must match the column's data type. |
required |
Examples:
get_async
¶
Asynchronously retrieve data from the column. Useful for large datasets or when loading multiple items in ML pipelines.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
index
|
int | slice | list | tuple
|
Can be:
|
required |
Returns:
Name | Type | Description |
---|---|---|
Future |
Future
|
A Future object that resolves to the requested data. |
Examples:
set_async
¶
Asynchronously set data in the column. Useful for large updates or when modifying multiple items in ML pipelines.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
index
|
int | slice
|
Can be:
|
required |
value
|
Any
|
The data to store. Must match the column's data type. |
required |
Returns:
Name | Type | Description |
---|---|---|
FutureVoid |
FutureVoid
|
A FutureVoid that completes when the update is finished. |
Examples:
name
property
¶
Get the name of the column.
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
The column name. |
ColumnView Class¶
deeplake.ColumnView
¶
Provides read-only access to a column in a dataset. ColumnView is designed for efficient data access in ML workflows, supporting both synchronous and asynchronous operations.
The ColumnView class allows you to:
- Access column data using integer indices, slices, or lists of indices
- Retrieve data asynchronously for better performance in ML pipelines
- Access column metadata and properties
- Get information about linked data if the column contains references
Examples:
Load image data from a column for training:
# Access a single image
image = ds["images"][0]
# Load a batch of images
batch = ds["images"][0:32]
# Async load for better performance
images_future = ds["images"].get_async(slice(0, 32))
images = images_future.result()
Access embeddings for similarity search:
# Get all embeddings
embeddings = ds["embeddings"][:]
# Get specific embeddings by indices
selected = ds["embeddings"][[1, 5, 10]]
Check column properties:
# Get column name
name = ds["images"].name
# Access metadata
if "mean" in ds["images"].metadata.keys():
mean = dataset["images"].metadata["mean"]
__getitem__
¶
Retrieve data from the column at the specified index or range.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
index
|
int | slice | list | tuple
|
Can be:
|
required |
Returns:
Type | Description |
---|---|
Any
|
The data at the specified index/indices. Type depends on the column's data type. |
Examples:
get_async
¶
Asynchronously retrieve data from the column. Useful for large datasets or when loading multiple items in ML pipelines.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
index
|
int | slice | list | tuple
|
Can be:
|
required |
Returns:
Name | Type | Description |
---|---|---|
Future |
Future
|
A Future object that resolves to the requested data. |
Examples:
metadata
property
¶
metadata: ReadOnlyMetadata
Access the column's metadata. Useful for storing statistics, preprocessing parameters, or other information about the column data.
Returns:
Name | Type | Description |
---|---|---|
ReadOnlyMetadata |
ReadOnlyMetadata
|
A ReadOnlyMetadata object for reading metadata. |
Examples:
name
property
¶
Get the name of the column.
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
The column name. |
Class Comparison¶
Column¶
- Provides read-write access
- Can modify data
- Can update metadata
- Available in Dataset
# Get mutable column
ds = deeplake.open("s3://bucket/dataset")
column = ds["images"]
# Read data
image = column[0]
batch = column[0:100]
# Write data
column[0] = new_image
column[0:100] = new_batch
# Async operations
future = column.set_async(0, new_image)
future.wait()
ColumnView¶
- Read-only access
- Cannot modify data
- Can read metadata
- Available in ReadOnlyDataset and DatasetView
# Get read-only column
ds = deeplake.open_read_only("s3://bucket/dataset")
column = ds["images"]
# Read data
image = column[0]
batch = column[0:100]
# Async read
future = column.get_async(slice(0, 100))
batch = future.result()
Examples¶
Data Access¶
# Direct indexing
single_item = column[0]
batch = column[0:100]
selected = column[[1, 5, 10]]
# Async data access
future = column.get_async(slice(0, 1000))
data = future.result()