Data Versioning¶
Deeplake datasets have built-in version control. Every mutation is tracked, and you can branch, tag, and merge just like with Git.
Managed tables expose the same versioning system — client.open_table() returns a deeplake.Dataset with full version control.
Set credentials first
Commits¶
Every call to ds.commit() creates an immutable snapshot.
History¶
Iterate over all versions from oldest to newest:
Access a specific version by ID or index:
# By index
first = ds.history[0]
latest = ds.history[-1]
# By version ID
version = ds.history[version_id]
# Open the dataset at that point in time
old_ds = version.open()
Version properties¶
| Property | Type | Description |
|---|---|---|
id |
str |
Unique version identifier |
message |
str \| None |
Commit message |
timestamp |
datetime |
Storage-provider timestamp |
client_timestamp |
datetime |
Writer's local clock |
Branches¶
Create a branch to work on data in isolation, then merge back.
# Create a branch
ds.branch("experiment")
# Open the branch
branch_ds = ds.branches["experiment"].open()
# Add data on the branch
branch_ds.append({"text": ["experimental row"]})
branch_ds.commit()
List and inspect branches¶
# List all branch names
print(ds.branches.names()) # ["main", "experiment"]
# Number of branches
print(len(ds.branches))
# Branch details
branch = ds.branches["experiment"]
print(branch.name) # "experiment"
print(branch.timestamp) # creation time
print(branch.base) # (parent_branch_id, parent_version_id)
Rename and delete¶
Merge¶
Merge a branch back into main:
ds.branch("feature")
feature_ds = ds.branches["feature"].open()
feature_ds.append({"text": ["feature row"]})
feature_ds.commit()
# Merge into main
main_ds = ds.branches["main"].open()
main_ds.merge("feature")
Tags¶
Tags are named pointers to specific versions — useful for marking releases or checkpoints.
Access tagged versions¶
tag = ds.tags["v1.0"]
print(tag.name) # "v1.0"
print(tag.version) # version ID it points to
print(tag.timestamp)
# Open the dataset at the tagged version
tagged_ds = tag.open()
List, rename, delete¶
# List all tags
for name in ds.tags.names():
tag = ds.tags[name]
print(f"{tag.name} → {tag.version}")
# Rename
ds.tags["v1.0"].rename("v1.0.0")
# Delete
ds.tags["v1.0.0"].delete()
Tag a specific version¶
Tag properties¶
| Property | Type | Description |
|---|---|---|
id |
str |
Unique tag identifier |
name |
str |
Tag name |
message |
str |
Tag message |
version |
str |
Version ID the tag points to |
timestamp |
datetime |
Creation timestamp |
Read-only access¶
When opening a dataset in read-only mode, branches and tags return view objects (BranchView, TagView) that support open() but not rename() or delete().
import deeplake
ds = deeplake.open_read_only("al://workspace/dataset")
# Browse history and tags (read-only)
for version in ds.history:
print(version.id, version.timestamp)
tagged_ds = ds.tags["v1.0"].open()
branch_ds = ds.branches["main"].open()
Next steps¶
- Tables: CRUD operations and column types
- Search: vector, BM25, and hybrid search
- Training with Data Lineage: versioning + reproducible training
- Training Reproducibility with W&B: track experiments across versions