Dataset APIs
Dataset Management
Datasets can be created, loaded, and managed through static factory methods in the deeplake
module.
deeplake.create
create(
url: str,
creds: dict[str, str] | None = None,
token: str | None = None,
schema: SchemaTemplate | None = None,
) -> Dataset
Creates a new dataset at the given URL.
To open an existing dataset, use deeplake.open
Parameters:
Name | Type | Description | Default |
---|---|---|---|
url
|
str
|
The URL of the dataset. URLs can be specified using the following protocols:
A URL without a protocol is assumed to be a file:// URL |
required |
creds
|
(dict, str)
|
The string
|
None
|
token
|
str
|
Activeloop token, used for fetching credentials to the dataset at path if it is a Deep Lake dataset. This is optional, tokens are normally autogenerated. |
None
|
schema
|
dict
|
The initial schema to use for the dataset. See |
None
|
Examples:
>>> import deeplake
>>> from deeplake import types
>>>
>>> # Create a dataset in your local filesystem:
>>> ds = deeplake.create("directory_path")
>>> ds.add_column("id", types.Int32())
>>> ds.add_column("url", types.Text())
>>> ds.add_column("embedding", types.Embedding(768))
>>> ds.commit()
>>> ds.summary()
Dataset(columns=(id,url,embedding), length=0)
+---------+-------------------------------------------------------+
| column | type |
+---------+-------------------------------------------------------+
| id | kind=generic, dtype=int32 |
+---------+-------------------------------------------------------+
| url | text |
+---------+-------------------------------------------------------+
|embedding|kind=embedding, dtype=array(dtype=float32, shape=[768])|
+---------+-------------------------------------------------------+
>>> # Create dataset in your app.activeloop.ai organization:
>>> ds = deeplake.create("al://organization_id/dataset_name")
>>> # Create a dataset stored in your cloud using specified credentials:
>>> ds = deeplake.create("s3://mybucket/my_dataset",
>>> creds = {"aws_access_key_id": ..., ...})
>>> # Create dataset stored in your cloud using app.activeloop.ai managed credentials.
>>> ds = deeplake.create("s3://mybucket/my_dataset",
>>> creds = {"creds_key": "managed_creds_key"}, org_id = "my_org_id")
>>> # Create dataset stored in your cloud using app.activeloop.ai managed credentials.
>>> ds = deeplake.create("azure://bucket/path/to/dataset")
Raises:
Type | Description |
---|---|
ValueError
|
if a dataset already exists at the given URL |
deeplake.create_async
create_async(
url: str,
creds: dict[str, str] | None = None,
token: str | None = None,
schema: SchemaTemplate | None = None,
) -> Future
Asynchronously creates a new dataset at the given URL.
See deeplake.create for more information.
To open an existing dataset, use deeplake.open_async.
Examples:
>>> import deeplake
>>> from deeplake import types
>>>
>>> # Asynchronously create a dataset in your local filesystem:
>>> ds = await deeplake.create_async("directory_path")
>>> await ds.add_column("id", types.Int32())
>>> await ds.add_column("url", types.Text())
>>> await ds.add_column("embedding", types.Embedding(768))
>>> await ds.commit()
>>> await ds.summary() # Example of usage in an async context
>>> # Alternatively, create a dataset using .result().
>>> future_ds = deeplake.create_async("directory_path")
>>> ds = future_ds.result() # Blocks until the dataset is created
>>> # Create a dataset in your app.activeloop.ai organization:
>>> ds = await deeplake.create_async("al://organization_id/dataset_name")
>>> # Create a dataset stored in your cloud using specified credentials:
>>> ds = await deeplake.create_async("s3://mybucket/my_dataset",
>>> creds={"aws_access_key_id": ..., ...})
>>> # Create dataset stored in your cloud using app.activeloop.ai managed credentials.
>>> ds = await deeplake.create_async("s3://mybucket/my_dataset",
>>> creds={"creds_key": "managed_creds_key"}, org_id="my_org_id")
>>> # Create dataset stored in your cloud using app.activeloop.ai managed credentials.
>>> ds = await deeplake.create_async("azure://bucket/path/to/dataset")
Raises:
Type | Description |
---|---|
ValueError
|
if a dataset already exists at the given URL (will be raised when the future is awaited) |
deeplake.open
open(
url: str,
creds: dict[str, str] | None = None,
token: str | None = None,
) -> Dataset
Opens an existing dataset, potenitally for modifying its content.
See deeplake.open_read_only for opening the dataset in read only mode
To create a new dataset, see deeplake.open
Parameters:
Name | Type | Description | Default |
---|---|---|---|
url
|
str
|
The URL of the dataset. URLs can be specified using the following protocols:
A URL without a protocol is assumed to be a file:// URL |
required |
creds
|
(dict, str)
|
The string
|
None
|
token
|
str
|
Activeloop token, used for fetching credentials to the dataset at path if it is a Deep Lake dataset. This is optional, tokens are normally autogenerated. |
None
|
Examples:
>>> # Load dataset managed by Deep Lake.
>>> ds = deeplake.open("al://organization_id/dataset_name")
>>> # Load dataset stored in your cloud using your own credentials.
>>> ds = deeplake.open("s3://bucket/my_dataset",
>>> creds = {"aws_access_key_id": ..., ...})
deeplake.open_async
open_async(
url: str,
creds: dict[str, str] | None = None,
token: str | None = None,
) -> Future
Asynchronously opens an existing dataset, potentially for modifying its content.
See deeplake.open for opening the dataset synchronously.
Examples:
>>> # Asynchronously load dataset managed by Deep Lake using await.
>>> ds = await deeplake.open_async("al://organization_id/dataset_name")
>>> # Asynchronously load dataset stored in your cloud using your own credentials.
>>> ds = await deeplake.open_async("s3://bucket/my_dataset",
>>> creds={"aws_access_key_id": ..., ...})
deeplake.open_read_only
open_read_only(
url: str,
creds: dict[str, str] | None = None,
token: str | None = None,
) -> ReadOnlyDataset
Opens an existing dataset in read-only mode.
See deeplake.open for opening datasets for modification.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
url
|
str
|
The URL of the dataset. URLs can be specified using the following protocols:
A URL without a protocol is assumed to be a file:// URL |
required |
creds
|
(dict, str)
|
The string
|
None
|
token
|
str
|
Activeloop token to authenticate user. |
None
|
Examples:
>>> ds = deeplake.open_read_only("directory_path")
>>> ds.summary()
Dataset(columns=(id,url,embedding), length=0)
+---------+-------------------------------------------------------+
| column | type |
+---------+-------------------------------------------------------+
| id | kind=generic, dtype=int32 |
+---------+-------------------------------------------------------+
| url | text |
+---------+-------------------------------------------------------+
|embedding|kind=embedding, dtype=array(dtype=float32, shape=[768])|
+---------+-------------------------------------------------------+
deeplake.open_read_only_async
open_read_only_async(
url: str,
creds: dict[str, str] | None = None,
token: str | None = None,
) -> Future
Asynchronously opens an existing dataset in read-only mode.
See deeplake.open_async for opening datasets for modification and deeplake.open_read_only for sync open.
Examples:
>>> # Asynchronously open a dataset in read-only mode:
>>> ds = await deeplake.open_read_only_async("directory_path")
deeplake.delete
Deletes an existing dataset.
Warning
This operation is irreversible. All data will be lost.
If concurrent processes are attempting to write to the dataset while it's being deleted, it may lead to data inconsistency. It's recommended to use this operation with caution.
deeplake.copy
copy(
src: str,
dst: str,
src_creds: dict[str, str] | None = None,
dst_creds: dict[str, str] | None = None,
token: str | None = None,
) -> None
Copies the dataset at the source URL to the destination URL.
NOTE: Currently private due to potential issues in file timestamp handling
Parameters:
Name | Type | Description | Default |
---|---|---|---|
src
|
str
|
The URL of the source dataset. |
required |
dst
|
str
|
The URL of the destination dataset. |
required |
src_creds
|
(dict, str)
|
The string |
None
|
dst_creds
|
(dict, str)
|
The string |
None
|
token
|
str
|
Activeloop token, used for fetching credentials to the dataset at path if it is a Deep Lake dataset. This is optional, tokens are normally autogenerated. |
None
|
Examples:
deeplake.like
like(
src: DatasetView,
dest: str,
creds: dict[str, str] | None = None,
token: str | None = None,
) -> Dataset
Creates a new dataset by copying the source
dataset's structure to a new location.
Note
No data is copied.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
src
|
DatasetView
|
The dataset to copy the structure from. |
required |
dest
|
str
|
The URL to create the new dataset at.
creds (dict, str, optional): The string
|
required |
token
|
str
|
Activeloop token, used for fetching credentials to the dataset at path if it is a Deep Lake dataset. This is optional, tokens are normally autogenerated. |
None
|
Examples:
deeplake.from_parquet
from_parquet(url: str) -> ReadOnlyDataset
Opens a Parquet dataset in the deeplake format.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
url
|
str
|
The URL of the Parquet dataset. If no protocol is specified, it assumes |
required |
deeplake.connect
connect(
src: str,
dest: str | None = None,
org_id: str | None = None,
creds_key: str | None = None,
token: str | None = None,
) -> Dataset
Connects an existing dataset your app.activeloop.ai account.
Either dest
or org_id
is required but not both.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
src
|
str
|
The URL to the existing dataset. |
required |
dest
|
str
|
Desired Activeloop url for the dataset entry. Example: |
None
|
org_id
|
str
|
The id of the organization to store the dataset under. The dataset name will be based on the source dataset's name. |
None
|
creds_key
|
str
|
The creds_key of the managed credentials that will be used to access the source path. If not set, use the organization's default credentials. |
None
|
token
|
str
|
Activeloop token used to fetch the managed credentials. |
None
|
Examples:
>>> ds = deeplake.connect("s3://bucket/path/to/dataset",
>>> "al://my_org/dataset", creds_key="my_key")
>>> # Connect the dataset as al://my_org/dataset
>>> ds = deeplake.connect("s3://bucket/path/to/dataset",
>>> org_id="my_org")
deeplake.disconnect
Disconnect the dataset your Activeloop account.
See deeplake.connect
Note
Does not delete the stored data, it only removes the connection from the activeloop organization
Parameters:
Name | Type | Description | Default |
---|---|---|---|
url
|
str
|
The URL of the dataset. |
required |
token
|
str
|
Activeloop token to authenticate user. |
None
|
Examples:
deeplake.convert
Copies the v3 dataset at src into a new dataset in the new v4 format.
deeplake.Dataset
Bases: DatasetView
Datasets are the primary data structure used in DeepLake. They are used to store and manage data for searching, training, evaluation.
Unlike deeplake.ReadOnlyDataset, instances of Dataset
can be modified.
__getitem__
__getitem__(offset: int) -> Row
__getitem__(range: slice) -> RowRange
__getitem__(column: str) -> Column
Returns a subset of data from the Dataset
The result will depend on the type of value passed to the []
operator.
int
: The zero-based offset of the single row to return. Returns a deeplake.Rowslice
: A slice specifying the range of rows to return. Returns a deeplake.RowRangestr
: A string specifying column to return all values from. Returns a deeplake.Column
Examples:
__getstate__
Returns a dict that can be pickled and used to restore this dataset.
Note
Pickling a dataset does not copy the dataset, it only saves attributes that can be used to restore the dataset.
__iter__
__iter__() -> Iterator[Row]
__setstate__
Restores dataset from a pickled state.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
arg0
|
dict
|
The pickled state used to restore the dataset. |
required |
add_column
add_column(
name: str,
dtype: DataType | str | Type | type | Callable,
format: DataFormat | None = None,
) -> None
Add a new column to the dataset.
Any existing rows in the dataset will have a None
value for the new column
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
The name of the column |
required |
dtype
|
DataType | str | Type | type | Callable
|
The type of the column. Possible values include:
|
required |
format
|
DataFormat
|
The format of the column, if applicable. Only required when the dtype is deeplake.types.DataType. |
None
|
Examples:
>>> ds.add_column("images", deeplake.types.Image(dtype=deeplake.types.UInt8(), sample_compression="jpeg"))
>>> ds.add_column("embedding", deeplake.types.Embedding(dtype=deeplake.types.Float32(), dimensions=768))
Raises:
Type | Description |
---|---|
ColumnAlreadyExistsError
|
If a column with the same name already exists. |
append
append(data: DatasetView) -> None
append(
data: (
list[dict[str, Any]] | dict[str, Any] | DatasetView
)
) -> None
Adds data to the dataset.
The data can be in a variety of formats:
- A list of dictionaries, each value in the list is a row, with the dicts containing the column name and its value for the row.
- A dictionary, the keys are the column names and the values are array-like (list or numpy.array) objects corresponding to the column values.
- A DatasetView that was generated through any mechanism
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data
|
list[dict[str, Any]] | dict[str, Any] | DatasetView
|
The data to insert into the dataset. |
required |
Examples:
Raises:
Type | Description |
---|---|
ColumnMissingAppendValueError
|
If any column is missing from the input data. |
UnevenColumnsError
|
If the input data columns are not the same length. |
InvalidTypeDimensions
|
If the input data does not match the column's dimensions. |
batches
batches(
batch_size: int, drop_last: bool = False
) -> Prefetcher
Return a deeplake.Prefetcher for this DatasetView
Parameters:
Name | Type | Description | Default |
---|---|---|---|
batch_size
|
int
|
Number of rows in each batch |
required |
drop_last
|
bool
|
Whether to drop the final batch if it is incomplete |
False
|
commit
Atomically commits changes you have made to the dataset. After commit, other users will see your changes to the dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
message
|
str
|
A message to store in history describing the changes made in the version |
None
|
Examples:
commit_async
commit_async(message: str | None = None) -> FutureVoid
Asynchronously commits changes you have made to the dataset.
See deeplake.Dataset.commit for more information.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
message
|
str
|
A message to store in history describing the changes made in the commit |
None
|
Examples:
created_time
property
When the dataset was created. The value is auto-generated at creation time.
delete
Delete a row from the dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
offset
|
int
|
The offset of the row within the dataset to delete |
required |
description
instance-attribute
The description of the dataset. Setting the value will immediately persist the change without requiring a commit().
name
instance-attribute
The name of the dataset. Setting the value will immediately persist the change without requiring a commit().
pull
Pulls any new history from the dataset at the passed url into this dataset.
Similar to deeplake.Dataset.push but the other direction.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
url
|
str
|
The URL of the destination dataset |
required |
creds
|
dict[str, str] | None
|
Optional credentials needed to connect to the dataset |
None
|
token
|
str | None
|
Optional deeplake token |
None
|
pull_async
pull_async(
url: str,
creds: dict[str, str] | None = None,
token: str | None = None,
) -> FutureVoid
Asynchronously pulls any new history from the dataset at the passed url into this dataset.
Similar to deeplake.Dataset.push_async but the other direction.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
url
|
str
|
The URL of the destination dataset |
required |
creds
|
dict[str, str] | None
|
Optional credentials needed to connect to the dataset |
None
|
token
|
str | None
|
Optional deeplake token |
None
|
push
Pushes any new history from this dataset to the dataset at the given url
Similar to deeplake.Dataset.pull but the other direction.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
url
|
str
|
The URL of the destination dataset |
required |
creds
|
dict[str, str] | None
|
Optional credentials needed to connect to the dataset |
None
|
token
|
str | None
|
Optional deeplake token |
None
|
push_async
push_async(
url: str,
creds: dict[str, str] | None = None,
token: str | None = None,
) -> FutureVoid
Asynchronously Pushes new any history from this dataset to the dataset at the given url
Similar to deeplake.Dataset.pull_async but the other direction.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
url
|
str
|
The URL of the destination dataset |
required |
creds
|
dict[str, str] | None
|
Optional credentials needed to connect to the dataset |
None
|
token
|
str | None
|
Optional deeplake token |
None
|
pytorch
Returns a PyTorch torch.utils.data. Dataset
wrapper around this dataset.
By default, no transformations are applied and each row is returned as a dict
with keys of column names.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
transform
|
Callable[[Any], Any]
|
A custom function to apply to each sample before returning it |
None
|
Raises:
Type | Description |
---|---|
ImportError
|
If pytorch is not installed |
Examples:
remove_column
rename_column
Renames the existing column in the dataset.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
The name of the column to rename |
required |
new_name
|
str
|
The new name to set to column |
required |
Examples:
Raises:
Type | Description |
---|---|
ColumnDoesNotExistsError
|
If a column with the specified name does not exists. |
ColumnAlreadyExistsError
|
If a column with the specified new name already exists. |
rollback
Reverts any in-progress changes to the dataset you have made. Does not revert any changes that have been committed.
rollback_async
rollback_async() -> FutureVoid
Asynchronously reverts any in-progress changes to the dataset you have made. Does not revert any changes that have been committed.
summary
Prints a summary of the dataset.
Examples:
>>> ds.summary()
Dataset(columns=(id,title,embedding), length=51611356)
+---------+-------------------------------------------------------+
| column | type |
+---------+-------------------------------------------------------+
| id | kind=generic, dtype=int32 |
+---------+-------------------------------------------------------+
| title | text |
+---------+-------------------------------------------------------+
|embedding|kind=embedding, dtype=array(dtype=float32, shape=[768])|
+---------+-------------------------------------------------------+
tag
tag(name: str, version: str | None = None) -> Tag
Tags a version of the dataset. If no version is given, the current version is tagged.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
The name of the tag |
required |
version
|
str | None
|
The version of the dataset to tag |
None
|
tensorflow
Returns a TensorFlow tensorflow.data.Dataset
wrapper around this DatasetView.
Raises:
Type | Description |
---|---|
ImportError
|
If TensorFlow is not installed |
Examples:
deeplake.ReadOnlyDataset
Bases: DatasetView
__getitem__
__getitem__(offset: int) -> RowView
__getitem__(range: slice) -> RowRangeView
__getitem__(column: str) -> ColumnView
__getitem__(
input: int | slice | str,
) -> RowView | RowRangeView | ColumnView
Returns a subset of data from the dataset.
The result will depend on the type of value passed to the []
operator.
int
: The zero-based offset of the single row to return. Returns a deeplake.RowViewslice
: A slice specifying the range of rows to return. Returns a deeplake.RowRangeViewstr
: A string specifying column to return all values from. Returns a deeplake.ColumnView
Examples:
__iter__
__iter__() -> Iterator[RowView]
batches
batches(
batch_size: int, drop_last: bool = False
) -> Prefetcher
Return a deeplake.Prefetcher for this DatasetView
Parameters:
Name | Type | Description | Default |
---|---|---|---|
batch_size
|
int
|
Number of rows in each batch |
required |
drop_last
|
bool
|
Whether to drop the final batch if it is incomplete |
False
|
created_time
property
When the dataset was created. The value is auto-generated at creation time.
push
Pushes any history from this dataset to the dataset at the given url
Parameters:
Name | Type | Description | Default |
---|---|---|---|
url
|
str
|
The URL of the destination dataset |
required |
creds
|
dict[str, str] | None
|
Optional credentials needed to connect to the dataset |
None
|
token
|
str | None
|
Optional deeplake token |
None
|
push_async
push_async(
url: str,
creds: dict[str, str] | None = None,
token: str | None = None,
) -> FutureVoid
Asynchronously Pushes any history from this dataset to the dataset at the given url
Parameters:
Name | Type | Description | Default |
---|---|---|---|
url
|
str
|
The URL of the destination dataset |
required |
creds
|
dict[str, str] | None
|
Optional credentials needed to connect to the dataset |
None
|
token
|
str | None
|
Optional deeplake token |
None
|
pytorch
Returns a PyTorch torch.utils.data. Dataset
wrapper around this dataset.
By default, no transformations are applied and each row is returned as a dict
with keys of column names.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
transform
|
Callable[[Any], Any]
|
A custom function to apply to each sample before returning it |
None
|
Raises:
Type | Description |
---|---|
ImportError
|
If pytorch is not installed |
Examples:
query
query(query: str) -> DatasetView
Executes the given TQL query against the dataset and return the results as a deeplake.DatasetView.
Examples:
query_async
query_async(query: str) -> Future
Asynchronously executes the given TQL query against the dataset and return a future that will resolve into deeplake.DatasetView.
Examples:
summary
Prints a summary of the dataset.
Examples:
>>> ds.summary()
Dataset(columns=(id,title,embedding), length=51611356)
+---------+-------------------------------------------------------+
| column | type |
+---------+-------------------------------------------------------+
| id | kind=generic, dtype=int32 |
+---------+-------------------------------------------------------+
| title | text |
+---------+-------------------------------------------------------+
|embedding|kind=embedding, dtype=array(dtype=float32, shape=[768])|
+---------+-------------------------------------------------------+
tensorflow
Returns a TensorFlow tensorflow.data.Dataset
wrapper around this DatasetView.
Raises:
Type | Description |
---|---|
ImportError
|
If TensorFlow is not installed |
Examples:
deeplake.DatasetView
A DatasetView is a dataset-like structure. It has a defined schema and contains data which can be queried.
__getitem__
__getitem__(offset: int) -> RowView
__getitem__(range: slice) -> RowRangeView
__getitem__(column: str) -> ColumnView
__getitem__(
input: int | slice | str,
) -> RowView | RowRangeView | ColumnView
Returns a subset of data from the DatasetView.
The result will depend on the type of value passed to the []
operator.
int
: The zero-based offset of the single row to return. Returns a deeplake.RowViewslice
: A slice specifying the range of rows to return. Returns a deeplake.RowRangeViewstr
: A string specifying column to return all values from. Returns a deeplake.ColumnView
Examples:
__iter__
__iter__() -> Iterator[RowView]
batches
batches(
batch_size: int, drop_last: bool = False
) -> Prefetcher
Return a deeplake.Prefetcher for this DatasetView
Parameters:
Name | Type | Description | Default |
---|---|---|---|
batch_size
|
int
|
Number of rows in each batch |
required |
drop_last
|
bool
|
Whether to drop the final batch if it is incomplete |
False
|
pytorch
Returns a PyTorch torch.utils.data. Dataset
wrapper around this dataset.
By default, no transformations are applied and each row is returned as a dict
with keys of column names.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
transform
|
Callable[[Any], Any]
|
A custom function to apply to each sample before returning it |
None
|
Raises:
Type | Description |
---|---|
ImportError
|
If pytorch is not installed |
Examples:
query
query(query: str) -> DatasetView
Executes the given TQL query against the dataset and return the results as a deeplake.DatasetView.
Examples:
query_async
query_async(query: str) -> Future
Asynchronously executes the given TQL query against the dataset and return a future that will resolve into deeplake.DatasetView.
Examples:
summary
Prints a summary of the dataset.
Examples:
>>> ds.summary()
Dataset(columns=(id,title,embedding), length=51611356)
+---------+-------------------------------------------------------+
| column | type |
+---------+-------------------------------------------------------+
| id | kind=generic, dtype=int32 |
+---------+-------------------------------------------------------+
| title | text |
+---------+-------------------------------------------------------+
|embedding|kind=embedding, dtype=array(dtype=float32, shape=[768])|
+---------+-------------------------------------------------------+
tensorflow
Returns a TensorFlow tensorflow.data.Dataset
wrapper around this DatasetView.
Raises:
Type | Description |
---|---|
ImportError
|
If TensorFlow is not installed |
Examples:
deeplake.Column
Bases: ColumnView
deeplake.ColumnView
Provides access to a column in a dataset.
deeplake.Row
Provides mutable access to a particular row in a dataset.
get_async
get_async(column: str) -> Future
Asynchronously retrieves data for the specified column and returns a Future object.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
column
|
str
|
The name of the column to retrieve data for. |
required |
Returns:
Name | Type | Description |
---|---|---|
Future |
Future
|
A Future object that will resolve to the value containing the column data. |
Examples:
>>> future = row.get_async("column_name")
>>> column = future.result() # Blocking call to get the result when it's ready.
Notes
- The Future will resolve asynchronously, meaning the method will not block execution while the data is being retrieved.
- You can either wait for the result using
future.result()
(a blocking call) or use the Future in anawait
expression.
set_async
set_async(column: str, value: Any) -> FutureVoid
Asynchronously sets a value for the specified column and returns a FutureVoid object.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
column
|
str
|
The name of the column to update. |
required |
value
|
Any
|
The value to set for the column. |
required |
Returns:
Name | Type | Description |
---|---|---|
FutureVoid |
FutureVoid
|
A FutureVoid object that will resolve when the operation is complete. |
Examples:
>>> future_void = row.set_async("column_name", new_value)
>>> future_void.wait() # Blocks until the operation is complete.
Notes
- The method sets the value asynchronously and immediately returns a FutureVoid.
- You can either block and wait for the operation to complete using
wait()
or await the FutureVoid object in an asynchronous context.
deeplake.RowView
Provides access to a particular row in a dataset.
get_async
get_async(column: str) -> Future
Asynchronously retrieves data for the specified column and returns a Future object.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
column
|
str
|
The name of the column to retrieve data for. |
required |
Returns:
Name | Type | Description |
---|---|---|
Future |
Future
|
A Future object that will resolve to the value containing the column data. |
Examples:
>>> future = row_view.get_async("column_name")
>>> column = future.result() # Blocking call to get the result when it's ready.
Notes
- The Future will resolve asynchronously, meaning the method will not block execution while the data is being retrieved.
- You can either wait for the result using
future.result()
(a blocking call) or use the Future in anawait
expression.
deeplake.RowRange
Provides mutable access to a row range in a dataset.
get_async
get_async(column: str) -> Future
Asynchronously retrieves data for the specified column and returns a Future object.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
column
|
str
|
The name of the column to retrieve data for. |
required |
Returns:
Name | Type | Description |
---|---|---|
Future |
Future
|
A Future object that will resolve to the value containing the column data. |
Examples:
>>> future = row_range.get_async("column_name")
>>> column = future.result() # Blocking call to get the result when it's ready.
Notes
- The Future will resolve asynchronously, meaning the method will not block execution while the data is being retrieved.
- You can either wait for the result using
future.result()
(a blocking call) or use the Future in anawait
expression.
set_async
set_async(column: str, value: Any) -> FutureVoid
Asynchronously sets a value for the specified column and returns a FutureVoid object.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
column
|
str
|
The name of the column to update. |
required |
value
|
Any
|
The value to set for the column. |
required |
Returns:
Name | Type | Description |
---|---|---|
FutureVoid |
FutureVoid
|
A FutureVoid object that will resolve when the operation is complete. |
Examples:
>>> future_void = row_range.set_async("column_name", new_value)
>>> future_void.wait() # Blocks until the operation is complete.
Notes
- The method sets the value asynchronously and immediately returns a FutureVoid.
- You can either block and wait for the operation to complete using
wait()
or await the FutureVoid object in an asynchronous context.
deeplake.RowRangeView
Provides access to a row range in a dataset.
get_async
get_async(column: str) -> Future
Asynchronously retrieves data for the specified column and returns a Future object.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
column
|
str
|
The name of the column to retrieve data for. |
required |
Returns:
Name | Type | Description |
---|---|---|
Future |
Future
|
A Future object that will resolve to the value containing the column data. |
Examples:
>>> future = row_range_view.get_async("column_name")
>>> column = future.result() # Blocking call to get the result when it's ready.
Notes
- The Future will resolve asynchronously, meaning the method will not block execution while the data is being retrieved.
- You can either wait for the result using
future.result()
(a blocking call) or use the Future in anawait
expression.
deeplake.Future
A future that represents a value that will be resolved in the future.
Once the Future is resolved, it will hold the result, and you can retrieve it
using either a blocking call (result()
) or via asynchronous mechanisms (await
).
The future will resolve automatically even if you do not explicitly wait for it.
Methods:
Name | Description |
---|---|
result |
Blocks until the Future is resolved and returns the object. |
__await__ |
Awaits the future asynchronously and returns the object once it's ready. |
is_completed |
Returns True if the Future is already resolved, False otherwise. |
__await__
is_completed
Checks if the Future has been resolved.
Returns:
Name | Type | Description |
---|---|---|
bool |
bool
|
True if the Future is resolved, False otherwise. |
deeplake.FutureVoid
A future that represents the completion of an operation that returns no result.
The future will resolve automatically to None
, even if you do not explicitly wait for it.
Methods:
Name | Description |
---|---|
wait |
Blocks until the FutureVoid is resolved and then returns |
__await__ |
Awaits the FutureVoid asynchronously and returns |
is_completed |
Returns True if the FutureVoid is already resolved, False otherwise. |
__await__
is_completed
Checks if the FutureVoid has been resolved.
Returns:
Name | Type | Description |
---|---|---|
bool |
bool
|
True if the FutureVoid is resolved, False otherwise. |