Skip to content

Metadata

Metadata provides key-value storage for datasets and columns.

Dataset Metadata

deeplake.Metadata

Bases: ReadOnlyMetadata

Writable access to dataset and column metadata for ML workflows.

Stores important information about datasets like:

  • Model parameters and hyperparameters
  • Preprocessing statistics
  • Data splits and fold definitions
  • Version and training information

Changes are persisted immediately without requiring commit().

Examples:

Storing model metadata:

ds.metadata["model_name"] = "resnet50"
ds.metadata["hyperparameters"] = {
    "learning_rate": 0.001,
    "batch_size": 32
}

Setting preprocessing stats:

ds["images"].metadata["mean"] = [0.485, 0.456, 0.406]
ds["images"].metadata["std"] = [0.229, 0.224, 0.225]

__getitem__
__getitem__(key: str) -> Any

Gets metadata value for the given key.

Parameters:

Name Type Description Default
key str

Metadata key to retrieve

required

Returns:

Type Description
Any

The stored metadata value

Examples:

mean = ds["images"].metadata["mean"]
std = ds["images"].metadata["std"]
__setitem__
__setitem__(key: str, value: Any) -> None

Sets metadata value for given key. Changes are persisted immediately.

Parameters:

Name Type Description Default
key str

Metadata key to set

required
value Any

Value to store

required

Examples:

ds.metadata["train_split"] = 0.8
ds.metadata["val_split"] = 0.1
ds.metadata["test_split"] = 0.1
__contains__
__contains__(key: str) -> bool

Checks if the metadata contains the given key.

keys
keys() -> list[str]

Lists all available metadata keys.

Returns:

Type Description
list[str]

list[str]: List of metadata key names

Examples:

# Print all metadata
for key in metadata.keys():
    print(f"{key}: {metadata[key]}")
# Set dataset metadata
ds.metadata["description"] = "Training dataset"
ds.metadata["version"] = "1.0"
ds.metadata["params"] = {
    "image_size": 224,
    "mean": [0.485, 0.456, 0.406],
    "std": [0.229, 0.224, 0.225]
}

# Read dataset metadata
description = ds.metadata["description"]
params = ds.metadata["params"]

# List all metadata keys
for key in ds.metadata.keys():
    print(f"{key}: {ds.metadata[key]}")

Column Metadata

deeplake.ReadOnlyMetadata

Read-only access to dataset and column metadata for ML workflows.

Stores important information about datasets like: - Model parameters and hyperparameters - Preprocessing statistics (mean, std, etc.) - Data splits and fold definitions - Version and training information

Examples:

Accessing model metadata:

metadata = ds.metadata
model_name = metadata["model_name"]
model_params = metadata["hyperparameters"]

Reading preprocessing stats:

mean = ds["images"].metadata["mean"]
std = ds["images"].metadata["std"]

__getitem__
__getitem__(key: str) -> Any

Gets metadata value for the given key.

Parameters:

Name Type Description Default
key str

Metadata key to retrieve

required

Returns:

Type Description
Any

The stored metadata value

Examples:

mean = ds["images"].metadata["mean"]
std = ds["images"].metadata["std"]
__contains__
__contains__(key: str) -> bool

Checks if the metadata contains the given key.

keys
keys() -> list[str]

Lists all available metadata keys.

Returns:

Type Description
list[str]

list[str]: List of metadata key names

Examples:

# Print all metadata
for key in metadata.keys():
    print(f"{key}: {metadata[key]}")
# Set column metadata
ds["images"].metadata["mean"] = [0.485, 0.456, 0.406]
ds["images"].metadata["std"] = [0.229, 0.224, 0.225]
ds["labels"].metadata["class_names"] = ["cat", "dog", "bird"]

# Read column metadata
mean = ds["images"].metadata["mean"]
class_names = ds["labels"].metadata["class_names"]

# Check if metadata key exists
if "mean" in ds["images"].metadata:
    print("Mean values are available")

# List all metadata keys for a column
print("Available metadata keys:")
for key in ds["images"].metadata.keys():
    print(f"  {key}: {ds['images'].metadata[key]}")

ds.commit() # Commit the changes to the dataset

Advanced Metadata Operations

# Dataset-level metadata operations
dataset_metadata = ds.metadata

# Check if key exists before accessing
if "training_config" in dataset_metadata:
    config = dataset_metadata["training_config"]
else:
    # Set default configuration
    dataset_metadata["training_config"] = {
        "epochs": 100,
        "batch_size": 32,
        "learning_rate": 0.001
    }

# List all dataset metadata
print("Dataset metadata:")
for key in dataset_metadata.keys():
    print(f"  {key}: {dataset_metadata[key]}")

# Column-level metadata operations
image_metadata = ds["images"].metadata

# Store preprocessing parameters
image_metadata["normalization"] = {
    "mean": [0.485, 0.456, 0.406],
    "std": [0.229, 0.224, 0.225]
}
image_metadata["resize_dimensions"] = [224, 224]

# Store data statistics
image_metadata["data_info"] = {
    "total_samples": len(ds),
    "channels": 3,
    "format": "RGB"
}

Read-Only Metadata Access

# Access metadata in read-only datasets
ro_ds = deeplake.open_read_only("s3://bucket/dataset")

# Read dataset metadata (read-only)
if "model_version" in ro_ds.metadata:
    version = ro_ds.metadata["model_version"]
    print(f"Model version: {version}")

# Read column metadata (read-only)
if "class_names" in ro_ds["labels"].metadata:
    classes = ro_ds["labels"].metadata["class_names"]
    print(f"Available classes: {classes}")

# List all available metadata keys
print("Dataset metadata keys:", ro_ds.metadata.keys())
print("Labels metadata keys:", ro_ds["labels"].metadata.keys())