Skip to content

Miscellaneous

Metadata

Metadata provides key-value storage for datasets and columns.

Dataset Metadata

deeplake.Metadata

Bases: ReadOnlyMetadata

Writable access to dataset and column metadata for ML workflows.

Stores important information about datasets like:

  • Model parameters and hyperparameters
  • Preprocessing statistics
  • Data splits and fold definitions
  • Version and training information

Changes are persisted immediately without requiring commit().

Examples:

Storing model metadata:

ds.metadata["model_name"] = "resnet50"
ds.metadata["hyperparameters"] = {
    "learning_rate": 0.001,
    "batch_size": 32
}

Setting preprocessing stats:

ds["images"].metadata["mean"] = [0.485, 0.456, 0.406]
ds["images"].metadata["std"] = [0.229, 0.224, 0.225]

__getitem__
__getitem__(key: str) -> Any

Gets metadata value for the given key.

Parameters:

Name Type Description Default
key str

Metadata key to retrieve

required

Returns:

Type Description
Any

The stored metadata value

Examples:

mean = ds["images"].metadata["mean"]
std = ds["images"].metadata["std"]
__setitem__
__setitem__(key: str, value: Any) -> None

Sets metadata value for given key. Changes are persisted immediately.

Parameters:

Name Type Description Default
key str

Metadata key to set

required
value Any

Value to store

required

Examples:

ds.metadata["train_split"] = 0.8
ds.metadata["val_split"] = 0.1
ds.metadata["test_split"] = 0.1
keys
keys() -> list[str]

Lists all available metadata keys.

Returns:

Type Description
list[str]

list[str]: List of metadata key names

Examples:

# Print all metadata
for key in metadata.keys():
    print(f"{key}: {metadata[key]}")
# Set dataset metadata
ds.metadata["description"] = "Training dataset"
ds.metadata["version"] = "1.0"
ds.metadata["params"] = {
    "image_size": 224,
    "mean": [0.485, 0.456, 0.406],
    "std": [0.229, 0.224, 0.225]
}

# Read dataset metadata
description = ds.metadata["description"]
params = ds.metadata["params"]

# List all metadata keys
for key in ds.metadata.keys():
    print(f"{key}: {ds.metadata[key]}")

Column Metadata

deeplake.ReadOnlyMetadata

Read-only access to dataset and column metadata for ML workflows.

Stores important information about datasets like: - Model parameters and hyperparameters - Preprocessing statistics (mean, std, etc.) - Data splits and fold definitions - Version and training information

Examples:

Accessing model metadata:

metadata = ds.metadata
model_name = metadata["model_name"]
model_params = metadata["hyperparameters"]

Reading preprocessing stats:

mean = ds["images"].metadata["mean"]
std = ds["images"].metadata["std"]

__getitem__
__getitem__(key: str) -> Any

Gets metadata value for the given key.

Parameters:

Name Type Description Default
key str

Metadata key to retrieve

required

Returns:

Type Description
Any

The stored metadata value

Examples:

mean = ds["images"].metadata["mean"]
std = ds["images"].metadata["std"]
keys
keys() -> list[str]

Lists all available metadata keys.

Returns:

Type Description
list[str]

list[str]: List of metadata key names

Examples:

# Print all metadata
for key in metadata.keys():
    print(f"{key}: {metadata[key]}")
# Set column metadata
ds["images"].metadata["mean"] = [0.485, 0.456, 0.406]
ds["images"].metadata["std"] = [0.229, 0.224, 0.225]
ds["labels"].metadata["class_names"] = ["cat", "dog", "bird"]

# Read column metadata
mean = ds["images"].metadata["mean"]
class_names = ds["labels"].metadata["class_names"]
ds.commit() # Commit the changes to the dataset

Version Control

Version

deeplake.Version

An atomic change within deeplake.Dataset's history

client_timestamp property
client_timestamp: datetime

When the version was created, according to the writer's local clock.

This timestamp is not guaranteed to be accurate, and deeplake.Version.timestamp should generally be used instead.

id property
id: str

The unique version identifier

message property
message: str | None

The description of the version provided at commit time.

open
open() -> ReadOnlyDataset

Fetches the dataset corresponding to the version

timestamp property
timestamp: datetime

The version timestamp.

This is based on the storage provider's clock, and so generally more accurate than deeplake.Version.client_timestamp.

# Get current version
version_id = ds.version

# Access specific version
version = ds.history[version_id]
print(f"Version: {version.id}")
print(f"Message: {version.message}")
print(f"Timestamp: {version.timestamp}")

# Open dataset at specific version
old_ds = version.open()

History

deeplake.History

The version history of a deeplake.Dataset.

__getitem__
__getitem__(offset: int) -> Version
__getitem__(version: str) -> Version
__getitem__(input: int | str) -> Version
__iter__
__iter__() -> Iterator[Version]

Iterate over the history, starting at the initial version

__len__
__len__() -> int

The number of versions within the history

# View all versions
for version in ds.history:
    print(f"Version {version.id}: {version.message}")
    print(f"Created: {version.timestamp}")

# Get specific version
version = ds.history[version_id]

# Get version by index
first_version = ds.history[0]
latest_version = ds.history[-1]

Tagging

Tag

deeplake.Tag

Describes a tag within the dataset.

Tags are created using deeplake.Dataset.tag.

delete
delete() -> None

Deletes the tag from the dataset

id property
id: str

The unique identifier of the tag

name property
name: str

The name of the tag

open
open() -> DatasetView

Fetches the dataset corresponding to the tag

open_async
open_async() -> Future

Asynchronously fetches the dataset corresponding to the tag and returns a Future object.

rename
rename(new_name: str) -> None

Renames the tag within the dataset

version property
version: str

The version that has been tagged

# Create tag
ds.tag("v1.0")

# Access tagged version
tag = ds.tags["v1.0"]
print(f"Tag: {tag.name}")
print(f"Version: {tag.version}")

# Open dataset at tag
tagged_ds = tag.open()

# Delete tag
tag.delete()

# Rename tag
tag.rename("v1.0.0")

Tags

deeplake.Tags

Provides access to the tags within a dataset.

It is returned by the [deeplake.Dataset.tags][] property.

__getitem__
__getitem__(name: str) -> Tag

Return a tag by name

__len__
__len__() -> int

The total number of tags in the dataset

names
names() -> list[str]

Return a list of tag names

# Create tag
ds.tag("v1.0")  # Tag current version
specific_version = ds.version
ds.tag("v2.0", version=specific_version)  # Tag specific version

# List all tags
for name in ds.tags.names():
    tag = ds.tags[name]
    print(f"Tag {tag.name} points to version {tag.version}")

# Check number of tags
num_tags = len(ds.tags)

# Access specific tag
tag = ds.tags["v1.0"]

# Common operations with tags
latest_ds = ds.tags["v2.0"].open()  # Open dataset at tag
stable_ds = ds.tags["v1.0"].open_async()  # Async open

# Error handling
try:
    tag = ds.tags["non_existent"]
except deeplake.TagNotFoundError:
    print("Tag not found")

TagView

deeplake.TagView

Describes a read-only tag within the dataset.

Tags are created using deeplake.Dataset.tag.

id property
id: str

The unique identifier of the tag

name property
name: str

The name of the tag

open
open() -> DatasetView

Fetches the dataset corresponding to the tag

open_async
open_async() -> Future

Asynchronously fetches the dataset corresponding to the tag and returns a Future object.

version property
version: str

The version that has been tagged

# Open read-only dataset
ds = deeplake.open_read_only("s3://bucket/dataset")

# Access tag view
tag_view = ds.tags["v1.0"]
print(f"Tag: {tag_view.name}")
print(f"Version: {tag_view.version}")

# Open dataset at tag
tagged_ds = tag_view.open()

TagsView

deeplake.TagsView

Provides access to the tags within a dataset.

It is returned by the [deeplake.Dataset.tags][] property on a deeplake.ReadOnlyDataset.

__getitem__
__getitem__(name: str) -> TagView

Return a tag by name

__len__
__len__() -> int

The total number of tags in the dataset

names
names() -> list[str]

Return a list of tag names

# Access read-only tags
tags_view = ds.tags

# List tag names
for name in tags_view.names():
    print(f"Found tag: {name}")

# Get specific tag
tag_view = tags_view["v1.0"]