Skip to content

Transactions & Consistency

Overview

Beginning with 4.0, Deep Lake is an eventually consistent database with atomic transactions.

Atomic Transactions

When making changes to your dataset such as appending data or adding columns, changes are only visible to you until you deeplake.Dataset.commit() them.

Once you commit your changes, they will be visible to anyone opening the dataset. Other users will never see a partially completed transaction. They either see all your changes or none of them.

Eventually Consistent

While commits are atomically applied, other clients working with the dataset may not see your commits immediately but they will eventually see them.

This can happen for several reasons, including:

  • Some storage systems are themselves eventually consistent, with a delay between when data is written and when it can be read
  • For performance reasons, Deep Lake does not constantly poll the dataset for changes after opening

Lockless Conflict Resolution

Deep Lake uses no read or write locks as part of the eventual consistency model. This means that you can read and write to the dataset at the same time, and you will never be blocked by another user.

This also means, however, that conflicts between writes must be resolved automatically. For example, if two users concurrently update the same row, or one user deletes a row while another updates it, Deep Lake must decide which change to keep.

Deep Lake uses the following resolution logic:

  1. Newly added rows are always added. The rows added within a particular commit will retain their relative order as a block, but rows from other commits will arbitrarily appear before or after them.
  2. Deleted rows are always deleted. If a row is deleted in one transaction and updated in another, the row will remain deleted.
  3. Conflicting updates are randomly chosen. If two transactions update the same value within a row concurrently, the winner will be basically random.

In cases where there is a random winner, such as the relative order of added rows or final value in an update, the final result is consistent -- the same winner will be chosen every time the dataset is opened.

Next Steps