Data Versioning¶

Deeplake datasets have built-in version control. Every mutation is tracked, and you can branch, tag, and merge just like with Git.

Managed tables expose the same versioning system. client.open_table() returns a deeplake.Dataset with full version control.

Set credentials first

export DEEPLAKE_API_KEY="your-token-here"
export DEEPLAKE_WORKSPACE="your-workspace"  # optional, defaults to "default"

from deeplake import Client

client = Client()
ds = client.open_table("my_table")

Commits¶

Every call to ds.commit() creates an immutable snapshot.

ds.append({"text": ["new row"]})
ds.commit()

# Current version ID
print(ds.version)

History¶

Iterate over all versions from oldest to newest:

for version in ds.history:
    print(f"{version.id} ({version.timestamp})")

Access a specific version by ID or index:

# By index
first = ds.history[0]
latest = ds.history[-1]

# By version ID
version = ds.history[version_id]

# Open the dataset at that point in time
old_ds = version.open()

Version properties¶

Property	Type	Description
`id`	`str`	Unique version identifier
`message`	`str \\| None`	Commit message
`timestamp`	`datetime`	Storage-provider timestamp
`client_timestamp`	`datetime`	Writer's local clock

Branches¶

Create a branch to work on data in isolation, then merge back.

# Create a branch
ds.branch("experiment")

# Open the branch
branch_ds = ds.branches["experiment"].open()

# Add data on the branch
branch_ds.append({"text": ["experimental row"]})
branch_ds.commit()

List and inspect branches¶

# List all branch names
print(ds.branches.names())  # ["main", "experiment"]

# Number of branches
print(len(ds.branches))

# Branch details
branch = ds.branches["experiment"]
print(branch.name)       # "experiment"
print(branch.timestamp)  # creation time
print(branch.base)       # (parent_branch_id, parent_version_id)

Rename and delete¶

ds.branches["experiment"].rename("exp-v2")
ds.branches["exp-v2"].delete()

Merge¶

Merge a branch back into main:

ds.branch("feature")
feature_ds = ds.branches["feature"].open()
feature_ds.append({"text": ["feature row"]})
feature_ds.commit()

# Merge into main
main_ds = ds.branches["main"].open()
main_ds.merge("feature")

Tags¶

Tags are named pointers to specific versions - useful for marking releases or checkpoints.

ds.commit()
ds.tag("v1.0")

Access tagged versions¶

tag = ds.tags["v1.0"]
print(tag.name)      # "v1.0"
print(tag.version)   # version ID it points to
print(tag.timestamp)

# Open the dataset at the tagged version
tagged_ds = tag.open()

List, rename, delete¶

# List all tags
for name in ds.tags.names():
    tag = ds.tags[name]
    print(f"{tag.name} → {tag.version}")

# Rename
ds.tags["v1.0"].rename("v1.0.0")

# Delete
ds.tags["v1.0.0"].delete()

Tag a specific version¶

version_id = ds.version
ds.tag("checkpoint", version=version_id)

Tag properties¶

Property	Type	Description
`id`	`str`	Unique tag identifier
`name`	`str`	Tag name
`message`	`str`	Tag message
`version`	`str`	Version ID the tag points to
`timestamp`	`datetime`	Creation timestamp
`query`	`str \\| None`	The TQL query string if the tag was created from a query view, `None` otherwise

Read-only access¶

To browse history and tags without making changes, open the table and use the version properties directly:

# Browse history and tags
for version in ds.history:
    print(version.id, version.timestamp)

tagged_ds = ds.tags["v1.0"].open()
branch_ds = ds.branches["main"].open()

When accessed this way, branches and tags return view objects (BranchView, TagView) that support open() but not rename() or delete().

Next steps¶

Tables: CRUD operations and column types
Search: vector, BM25, and hybrid search
Training with Data Lineage: versioning + reproducible training
Training Reproducibility with W&B: track experiments across versions