Miscellaneous¶
Metadata¶
Metadata provides key-value storage for datasets and columns.
Dataset Metadata¶
deeplake.Metadata
¶
Bases: ReadOnlyMetadata
Writable access to dataset and column metadata for ML workflows.
Stores important information about datasets like:
- Model parameters and hyperparameters
- Preprocessing statistics
- Data splits and fold definitions
- Version and training information
Changes are persisted immediately without requiring commit()
.
Examples:
Storing model metadata:
ds.metadata["model_name"] = "resnet50"
ds.metadata["hyperparameters"] = {
"learning_rate": 0.001,
"batch_size": 32
}
Setting preprocessing stats:
ds["images"].metadata["mean"] = [0.485, 0.456, 0.406]
ds["images"].metadata["std"] = [0.229, 0.224, 0.225]
# Set dataset metadata
ds.metadata["description"] = "Training dataset"
ds.metadata["version"] = "1.0"
ds.metadata["params"] = {
"image_size": 224,
"mean": [0.485, 0.456, 0.406],
"std": [0.229, 0.224, 0.225]
}
# Read dataset metadata
description = ds.metadata["description"]
params = ds.metadata["params"]
# List all metadata keys
for key in ds.metadata.keys():
print(f"{key}: {ds.metadata[key]}")
Column Metadata¶
deeplake.ReadOnlyMetadata
¶
Read-only access to dataset and column metadata for ML workflows.
Stores important information about datasets like: - Model parameters and hyperparameters - Preprocessing statistics (mean, std, etc.) - Data splits and fold definitions - Version and training information
Examples:
Accessing model metadata:
metadata = ds.metadata
model_name = metadata["model_name"]
model_params = metadata["hyperparameters"]
Reading preprocessing stats:
# Set column metadata
ds["images"].metadata["mean"] = [0.485, 0.456, 0.406]
ds["images"].metadata["std"] = [0.229, 0.224, 0.225]
ds["labels"].metadata["class_names"] = ["cat", "dog", "bird"]
# Read column metadata
mean = ds["images"].metadata["mean"]
class_names = ds["labels"].metadata["class_names"]
ds.commit() # Commit the changes to the dataset
Version Control¶
Version¶
deeplake.Version
¶
An atomic change within deeplake.Dataset's history
client_timestamp
property
¶
When the version was created, according to the writer's local clock.
This timestamp is not guaranteed to be accurate, and deeplake.Version.timestamp should generally be used instead.
timestamp
property
¶
The version timestamp.
This is based on the storage provider's clock, and so generally more accurate than deeplake.Version.client_timestamp.
# Get current version
version_id = ds.version
# Access specific version
version = ds.history[version_id]
print(f"Version: {version.id}")
print(f"Message: {version.message}")
print(f"Timestamp: {version.timestamp}")
# Open dataset at specific version
old_ds = version.open()
History¶
deeplake.History
¶
The version history of a deeplake.Dataset.
# View all versions
for version in ds.history:
print(f"Version {version.id}: {version.message}")
print(f"Created: {version.timestamp}")
# Get specific version
version = ds.history[version_id]
# Get version by index
first_version = ds.history[0]
latest_version = ds.history[-1]
Tagging¶
Tag¶
deeplake.Tag
¶
Describes a tag within the dataset.
Tags are created using deeplake.Dataset.tag.
open_async
¶
Asynchronously fetches the dataset corresponding to the tag and returns a Future object.
# Create tag
ds.tag("v1.0")
# Access tagged version
tag = ds.tags["v1.0"]
print(f"Tag: {tag.name}")
print(f"Version: {tag.version}")
# Open dataset at tag
tagged_ds = tag.open()
# Delete tag
tag.delete()
# Rename tag
tag.rename("v1.0.0")
Tags¶
deeplake.Tags
¶
# Create tag
ds.tag("v1.0") # Tag current version
specific_version = ds.version
ds.tag("v2.0", version=specific_version) # Tag specific version
# List all tags
for name in ds.tags.names():
tag = ds.tags[name]
print(f"Tag {tag.name} points to version {tag.version}")
# Check number of tags
num_tags = len(ds.tags)
# Access specific tag
tag = ds.tags["v1.0"]
# Common operations with tags
latest_ds = ds.tags["v2.0"].open() # Open dataset at tag
stable_ds = ds.tags["v1.0"].open_async() # Async open
# Error handling
try:
tag = ds.tags["non_existent"]
except deeplake.TagNotFoundError:
print("Tag not found")
TagView¶
deeplake.TagView
¶
Describes a read-only tag within the dataset.
Tags are created using deeplake.Dataset.tag.
open_async
¶
Asynchronously fetches the dataset corresponding to the tag and returns a Future object.
# Open read-only dataset
ds = deeplake.open_read_only("s3://bucket/dataset")
# Access tag view
tag_view = ds.tags["v1.0"]
print(f"Tag: {tag_view.name}")
print(f"Version: {tag_view.version}")
# Open dataset at tag
tagged_ds = tag_view.open()
TagsView¶
deeplake.TagsView
¶
Provides access to the tags within a dataset.
It is returned by the [deeplake.Dataset.tags][] property on a deeplake.ReadOnlyDataset.