Skip to content

Querying Data

Deeplake provides a powerful query language called TQL (Table Query Language) that allows you to query datasets in a SQL-like manner.

Full documentation on TQL syntax can be found here.

Single-Dataset Query

deeplake.Dataset.query

query(query: str) -> DatasetView

Executes the given TQL query against the dataset and return the results as a deeplake.DatasetView.

Examples:

>>> result = ds.query("select * where category == 'active'")
>>> for row in result:
>>>     print("Id is: ", row["id"])

deeplake.Dataset.query_async

query_async(query: str) -> Future

Asynchronously executes the given TQL query against the dataset and return a future that will resolve into deeplake.DatasetView.

Examples:

>>> future = ds.query_async("select * where category == 'active'")
>>> result = future.result()
>>> for row in result:
>>>     print("Id is: ", row["id"])
>>> # or use the Future in an await expression
>>> future = ds.query_async("select * where category == 'active'")
>>> result = await future
>>> for row in result:
>>>     print("Id is: ", row["id"])

Cross-Dataset Query

deeplake.query

query(query: str, token: str | None = None) -> DatasetView

Executes the given TQL query and returns a DatasetView.

Compared to deeplake.Dataset.query, this version of query can join multiple datasets together or query a single dataset without opening it first.

Examples:

>>> r = deeplake.query("select * from \"al://my_org/dataset\" where id > 30")

deeplake.query_async

query_async(query: str, token: str | None = None) -> Future

Asynchronously executes the given TQL query and returns a Future that will resolve into DatasetView.

Examples:

>>> future = deeplake.query_async("select * where category == 'active'")
>>> result = future.result()
>>> for row in result:
>>>     print("Id is: ", row["id"])
>>> # or use the Future in an await expression
>>> future = deeplake.query_async("select * where category == 'active'")
>>> result = await future
>>> for row in result:
>>>     print("Id is: ", row["id"])

Custom TQL Functions

deeplake.tql.register_function

register_function(function: Callable) -> None

Registers the given function in TQL, to be used in queries. TQL interacts with Python functions through numpy.ndarray. The Python function to be used in TQL should accept input arguments as numpy arrays and return numpy array.

Examples:

>>> def next_number(a):
>>>     return a + 1
>>>
>>> deeplake.tql.register_function(next_number)
>>>
>>> r = ds.query("SELECT * WHERE next_number(column_name) > 10")