Skip to content

Data Versioning

Deeplake datasets have built-in version control. Every mutation is tracked, and you can branch, tag, and merge just like with Git.

Managed tables expose the same versioning system — client.open_table() returns a deeplake.Dataset with full version control.

Set credentials first

export DEEPLAKE_API_KEY="your-token-here"
export DEEPLAKE_WORKSPACE="your-workspace"  # optional, defaults to "default"
from deeplake import Client

client = Client()
ds = client.open_table("my_table")

Commits

Every call to ds.commit() creates an immutable snapshot.

ds.append({"text": ["new row"]})
ds.commit()

# Current version ID
print(ds.version)

History

Iterate over all versions from oldest to newest:

for version in ds.history:
    print(f"{version.id} ({version.timestamp})")

Access a specific version by ID or index:

# By index
first = ds.history[0]
latest = ds.history[-1]

# By version ID
version = ds.history[version_id]

# Open the dataset at that point in time
old_ds = version.open()

Version properties

Property Type Description
id str Unique version identifier
message str \| None Commit message
timestamp datetime Storage-provider timestamp
client_timestamp datetime Writer's local clock

Branches

Create a branch to work on data in isolation, then merge back.

# Create a branch
ds.branch("experiment")

# Open the branch
branch_ds = ds.branches["experiment"].open()

# Add data on the branch
branch_ds.append({"text": ["experimental row"]})
branch_ds.commit()

List and inspect branches

# List all branch names
print(ds.branches.names())  # ["main", "experiment"]

# Number of branches
print(len(ds.branches))

# Branch details
branch = ds.branches["experiment"]
print(branch.name)       # "experiment"
print(branch.timestamp)  # creation time
print(branch.base)       # (parent_branch_id, parent_version_id)

Rename and delete

ds.branches["experiment"].rename("exp-v2")
ds.branches["exp-v2"].delete()

Merge

Merge a branch back into main:

ds.branch("feature")
feature_ds = ds.branches["feature"].open()
feature_ds.append({"text": ["feature row"]})
feature_ds.commit()

# Merge into main
main_ds = ds.branches["main"].open()
main_ds.merge("feature")

Tags

Tags are named pointers to specific versions — useful for marking releases or checkpoints.

ds.commit()
ds.tag("v1.0")

Access tagged versions

tag = ds.tags["v1.0"]
print(tag.name)      # "v1.0"
print(tag.version)   # version ID it points to
print(tag.timestamp)

# Open the dataset at the tagged version
tagged_ds = tag.open()

List, rename, delete

# List all tags
for name in ds.tags.names():
    tag = ds.tags[name]
    print(f"{tag.name}{tag.version}")

# Rename
ds.tags["v1.0"].rename("v1.0.0")

# Delete
ds.tags["v1.0.0"].delete()

Tag a specific version

version_id = ds.version
ds.tag("checkpoint", version=version_id)

Tag properties

Property Type Description
id str Unique tag identifier
name str Tag name
message str Tag message
version str Version ID the tag points to
timestamp datetime Creation timestamp

Read-only access

When opening a dataset in read-only mode, branches and tags return view objects (BranchView, TagView) that support open() but not rename() or delete().

import deeplake

ds = deeplake.open_read_only("al://workspace/dataset")

# Browse history and tags (read-only)
for version in ds.history:
    print(version.id, version.timestamp)

tagged_ds = ds.tags["v1.0"].open()
branch_ds = ds.branches["main"].open()

Next steps