Tensor Query Language
This page describes the Tensor Query Language (TQL), an SQL-like language used for Querying in Activeloop Platform
as well as in ds.query
in our Python API. To use queries, install deeplake with pip install deeplake[enterprise]
.
Syntax
SELECT
TQL supports only SELECT
statement. Every TQL expression starts with SELECT *
. TQL supports only *
which means to select all tensors.
The common syntax for select statement is the following:
SELECT * [FROM string] [WHERE expression] [LIMIT number [OFFSET number]] [ORDER BY expression [ASC/DESC]]
Each part of the SELECT
statement can be omitted.
FROM
expression is allowed, but it does not have any effect on the query, because for now TQL queries are run on a specific dataset,
so the FROM
is known from the context
WHERE
WHERE
expression is used to filter the samples in the dataset by conditions. The conditions should be convertible to boolean.
Any expression which outputs a number will be converted to boolean with non-zero values taken as True
. If the expression is not convertible to boolean,
such as strings, json objects and arrays, the query will print the corresponding error.
ORDER BY
ORDER BY
expression orders the output of the query by the given criteria. The criteria can be any expression output of which can be ordered.
The ordered outputs are either scalar numbers or strings. In addition it can also be json, which contains number or string.
ORDER BY
statement optionally accepts ASC/DESC
keywords specifying whether the ordering should be ascending or descending.
It is ascending by default.
LIMIT OFFSET
LIMIT
and OFFSET
expressions are used to limit the output of the query by index, as in SQL.
Expressions
TQL supports any comparison operator (==, !=, <, <=, >=
) where the left side is a tensor and the right side is a known value.
The value can be numeric scalar or array as well as string value.
String literal should be provided within single quotes ('
) and can be used on class_label
, json
and text
tensors.
For class labels it will get corresponding numeric value from the class_names list and do numeric comparison.
For json and text it will do string comparison. The left side of the expression
can be indexed (subscripted) if the tensor is multidimensional array or json. Jsons support indexing by string, e.g. index_meta['id'] == 'some_id'
.
Jsons can also be indexed by number if the underlying data is array.
Numeric multidimensional tensors can be indexed by numbers, e.g. categories[0] == 1
as well as Python style slicing and
multidimensional indexing, such as boxes[:2]
. This last expression returns array containing the third elements of the initial
two dimensional array boxes.
TQL supports logical operators - AND
, OR
and NOT
. These operators can be used to combine boolean expressions.
For example,
labels == 0 OR labels == 1
From SQL we also support the following two keywords:
BETWEEN
labels BETWEEN 0 and 5
IN
labels in ARRAY[0, 2, 4, 6, 8]
Functions
There are predefined functions which can be used in WHERE
expression as well as in ORDER BY
expressions:
CONTAINS
- checks if the given tensor contains given value -CONTAINS(categories, 'person')
RANDOM
- returns random number. May be used inORDER BY
to shuffle the output -ORDER BY RANDOM()
SHAPE
- returns the shape array of the given tensor -SHAPE(boxes)
ALL
- takes an array of booleans and returns single boolean,True
if all elements of the input array areTrue
ALL_STRICT
- same asALL
with one difference.ALL
returnsTrue
on empty array, whileALL_STRICT
returnFalse
ANY
- takes an array of booleans and returns single boolean,True
if any of the elements int the input array isTrue
LOGICAL_AND
- takes two boolean arrays, does element wise logical and, returns the result array. This will returnFalse
if the input arrays have different sizes.LOGICAL_OR
- takes two boolean arrays, does element wise logical or, returns the result array. This will returnFalse
if the input arrays have different sizes.
UNION, INTERSECT, EXCEPT
Query can contain multiple SELECT
statements, combined by one of the set operations - UNION
, INTERSECT
and EXCEPT
.
Examples
Querying for images containing 0 in MNIST Train Dataset with ds.query
.
>>> import deeplake
>>> ds = deeplake.load("hub://activeloop/mnist-train")
>>> result = ds.query("select * where labels == 0")
>>> len(result)
5923
Querying for samples with car
or motorcycle
in categories
of COCO Train Dataset.
>>> import deeplake
>>> ds = deeplake.load("hub://activeloop/coco-train")
>>> result = ds.query("(select * where contains(categories, 'car')) union (select * where contains(categories, 'motorcycle'))")
>>> len(result)
14376