deeplake.core.tensor

Tensor

class deeplake.core.tensor.Tensor
__len__()

Returns the length of the primary axis of the tensor. Accounts for indexing into the tensor object.

Examples

>>> len(tensor)
0
>>> tensor.extend(np.zeros((100, 10, 10)))
>>> len(tensor)
100
>>> len(tensor[5:10])
5
Returns

The current length of this tensor.

Return type

int

__setitem__(item: Union[int, slice], value: Any)

Update samples with new values.

Example

>>> tensor.append(np.zeros((10, 10)))
>>> tensor.shape
(1, 10, 10)
>>> tensor[0] = np.zeros((3, 3))
>>> tensor.shape
(1, 3, 3)
_check_compatibility_with_htype(htype)

Checks if the tensor is compatible with the given htype. Raises an error if not compatible.

property _config

Returns a summary of the configuration of the tensor.

_linked_sample()

Returns the linked sample at the given index. This is only applicable for tensors of link[] htype and can only be used for exactly one sample.

>>> linked_sample = ds.abc[0]._linked_sample().path
'https://picsum.photos/200/300'
_pop(index: List[int])

Removes elements at the given indices. index must be sorted in descending order.

append(sample: Union[Sample, ndarray, int, float, bool, dict, list, str, integer, floating, bool_])

Appends a single sample (row) to the end of the tensor.

Examples

Numpy input:

>>> len(tensor)
0
>>> tensor.append(np.zeros((28, 28, 1)))
>>> len(tensor)
1

File input:

>>> len(tensor)
0
>>> tensor.append(deeplake.read("path/to/file"))
>>> len(tensor)
1
Parameters

sample (InputSample) – The data to append to the tensor. Sample is generated by deeplake.read(). See the above examples.

property base_htype

Base htype of the tensor.

Example

>>> ds.create_tensor("video_seq", htype="sequence[video]", sample_compression="mp4")
>>> ds.video_seq.htype
sequence[video]
>>> ds.video_seq.base_htype
video
clear()

Deletes all samples from the tensor

create_vdb_index(id: str = 'hnsw_1', distance: Union[DistanceType, str] = DistanceType.COSINE_SIMILARITY, additional_params: Optional[Dict[str, int]] = None)

Create similarity search index for embedding tensor or inverted index for text tensor.

Parameters
  • id (str) – Unique identifier for the index. Defaults to hnsw_1. or inverted_index1.

  • distance (DistanceType, str) – Distance metric to be used for similarity search. Possible values are “l2_norm”, “cosine_similarity”. Defaults to DistanceType.COSINE_SIMILARITY.

  • additional_params (Optional[Dict[str, int]]) –

    Additional parameters for the index. - Structure of additional params is used for HNSW index:

    ”M”

    Increasing this value will increase the index build time and memory usage but will improve the search accuracy. Defaults to 16.

    ”efConstruction”

    Defaults to 200.

    ”partitions”

    If tensors contain more than 45M samples, it is recommended to use partitions to create the index. Defaults to 1.

    • Structure of additional params is used for Inverted index:
      ”bloom_filter_size”

      Size of the bloom filter. Defaults to 100000.

      ”segment_size”

      Size of the segment in MB. Defaults to 25.

Example

>>> ds = deeplake.load("./test/my_embedding_ds")
>>> # create cosine_similarity index on embedding tensor
>>> ds.embedding.create_vdb_index(id="hnsw_1", distance=DistanceType.COSINE_SIMILARITY)
>>> # create cosine_similarity index on embedding tensor with additional params
>>> ds.embedding.create_vdb_index(id="hnsw_1", distance=DistanceType.COSINE_SIMILARITY, additional_params={"M": 32, "partitions": 1, 'efConstruction': 200})
>>> # create inverted index on text tensor
>>> ds.text.create_vdb_index(id="inverted_index1")
>>> # create inverted index on text tensor with additional params
>>> ds.text.create_vdb_index(id="inverted_index1", additional_params={"bloom_filter_size": 1000000, "segment_size": 50})

Notes

Index creation is supported for embedding tensors and text tensors.

Raises

Exception – If the tensor is not an embedding tensor or text tensor.

Returns

Returns the index object.

Return type

Index

creds_key()

Return path data. Only applicable for linked tensors

data(aslist: bool = False, fetch_chunks: bool = False) Any

Returns data in the tensor in a format based on the tensor’s base htype.

  • If tensor has text base htype
  • If tensor has json base htype
  • If tensor has list base htype
  • For video tensors, returns a dict with keys “frames”, “timestamps” and “sample_info”:

    • Value of dict[“frames”] will be same as numpy().

    • Value of dict[“timestamps”] will be same as timestamps corresponding to the frames.

    • Value of dict[“sample_info”] will be same as sample_info.

  • For class_label tensors, returns a dict with keys “value” and “text”.

    • Value of dict[“value”] will be same as numpy().

    • Value of dict[“text”] will be list of class labels as strings.

  • For image or dicom tensors, returns dict with keys “value” and “sample_info”.

    • Value of dict[“value”] will be same as numpy().

    • Value of dict[“sample_info”] will be same as sample_info.

  • For all else, returns dict with key “value” with value same as numpy().

dict(fetch_chunks: bool = False)

Return json data. Only applicable for tensors with ‘json’ base htype.

property dtype: Optional[dtype]

Dtype of the tensor.

extend(samples: Union[ndarray, Sequence[Union[Sample, ndarray, int, float, bool, dict, list, str, integer, floating, bool_]], Tensor], progressbar: bool = False, ignore_errors: bool = False)

Extends the end of the tensor by appending multiple elements from a sequence. Accepts a sequence (i.e. a list) or a single numpy array (the first axis in the array is treated as the row axis).

Example

Numpy input:

>>> len(tensor)
0
>>> tensor.extend(np.zeros((100, 28, 28, 1)))
>>> len(tensor)
100

File input:

>>> len(tensor)
0
>>> tensor.extend([
        deeplake.read("path/to/image1"),
        deeplake.read("path/to/image2"),
    ])
>>> len(tensor)
2
Parameters
  • samples (np.ndarray, Sequence, Sequence[Sample]) – The data to add to the tensor. The length should be equal to the number of samples to add.

  • progressbar (bool) – Specifies whether a progressbar should be displayed while extending.

  • ignore_errors (bool) – Skip samples that cause errors while extending, if set to True.

Raises

TensorDtypeMismatchError – Dtype for array must be equal to or castable to this tensor’s dtype.

property hidden: bool

Whether this tensor is a hidden tensor.

property htype

Htype of the tensor.

property info: Info

Returns the information about the tensor. User can set info of tensor.

Returns

Information about the tensor.

Return type

Info

Example

>>> # update info
>>> ds.images.info.update(large=True, gray=False)
>>> # get info
>>> ds.images.info
{'large': True, 'gray': False}
>>> ds.images.info = {"complete": True}
>>> ds.images.info
{'complete': True}
invalidate_libdeeplake_dataset()

Invalidates the libdeeplake dataset object.

property is_dynamic: bool

Will return True if samples in this tensor have shapes that are unequal.

Whether this tensor is a link tensor.

property is_sequence

Whether this tensor is a sequence tensor.

list(fetch_chunks: bool = False)

Return list data. Only applicable for tensors with ‘list’ or ‘tag’ base htype.

property meta

Metadata of the tensor.

modified_samples(target_id: Optional[str] = None, return_indexes: Optional[bool] = False)

Returns a slice of the tensor with only those elements that were modified/added. By default the modifications are calculated relative to the previous commit made, but this can be changed by providing a target id.

Parameters
  • target_id (str, optional) – The commit id or branch name to calculate the modifications relative to. Defaults to None.

  • return_indexes (bool, optional) – If True, returns the indexes of the modified elements. Defaults to False.

Returns

A new tensor with only the modified elements if return_indexes is False. Tuple[Tensor, List[int]]: A new tensor with only the modified elements and the indexes of the modified elements if return_indexes is True.

Return type

Tensor

Raises

TensorModifiedError – If a target id is passed which is not an ancestor of the current commit.

property ndim: int

Number of dimensions of the tensor.

property num_samples: int

Returns the length of the primary axis of the tensor. Ignores any applied indexing and returns the total length.

numpy(aslist=False, fetch_chunks=False) Union[ndarray, List[ndarray]]

Computes the contents of the tensor in numpy format.

Parameters
  • aslist (bool) – If True, a list of np.ndarrays will be returned. Helpful for dynamic tensors. If False, a single np.ndarray will be returned unless the samples are dynamically shaped, in which case an error is raised.

  • fetch_chunks (bool) –

    If True, full chunks will be retrieved from the storage, otherwise only required bytes will be retrieved. This will always be True even if specified as False in the following cases:

    • The tensor is ChunkCompressed.

    • The chunk which is being accessed has more than 128 samples.

Raises
  • DynamicTensorNumpyError – If reading a dynamically-shaped array slice without aslist=True.

  • ValueError – If the tensor is a link and the credentials are not populated.

Returns

A numpy array containing the data represented by this tensor.

Note

For tensors of htype polygon, aslist is always True.

path(aslist: bool = True, fetch_chunks: bool = False)

Return path data. Only applicable for linked tensors.

Parameters
  • aslist (bool) – Returns links in a list if True.

  • fetch_chunks (bool) – If True, full chunks will be retrieved from the storage, otherwise only required bytes will be retrieved.

Returns

A list or numpy array of links.

Return type

Union[np.ndarray, List]

Raises

Exception – If the tensor is not a linked tensor.

play()

Play video sample. Plays video in Jupyter notebook or plays in web browser. Video is streamed directly from storage. This method will fail for incompatible htypes.

Example

>>> ds = deeplake.load("./test/my_video_ds")
>>> # play second sample
>>> ds.videos[2].play()

Note

Video streaming is not yet supported on colab.

pop(index: Optional[Union[int, List[int]]] = None)

Removes element(s) at the given index / indices.

property sample_indices

Returns all the indices pointed to by this tensor in the dataset view.

property sample_info: Union[Dict, List[Dict]]

Returns info about particular samples in a tensor. Returns dict in case of single sample, otherwise list of dicts. Data in returned dict would depend on the tensor’s htype and the sample itself.

Example

>>> ds.videos[0].sample_info
{'duration': 400400, 'fps': 29.97002997002997, 'timebase': 3.3333333333333335e-05, 'shape': [400, 360, 640, 3], 'format': 'mp4', 'filename': '../deeplake/tests/dummy_data/video/samplemp4.mp4', 'modified': False}
>>> ds.images[:2].sample_info
[{'exif': {'Software': 'Google'}, 'shape': [900, 900, 3], 'format': 'jpeg', 'filename': '../deeplake/tests/dummy_data/images/cat.jpeg', 'modified': False}, {'exif': {}, 'shape': [495, 750, 3], 'format': 'jpeg', 'filename': '../deeplake/tests/dummy_data/images/car.jpg', 'modified': False}]
property shape: Tuple[Optional[int], ...]

Get the shape of this tensor. Length is included.

Example

>>> tensor.append(np.zeros((10, 10)))
>>> tensor.append(np.zeros((10, 15)))
>>> tensor.shape
(2, 10, None)
Returns

Tuple where each value is either None (if that axis is dynamic) or an int (if that axis is fixed).

Return type

tuple

Note

If you don’t want None in the output shape or want the lower/upper bound shapes, use shape_interval instead.

property shape_interval: ShapeInterval

Returns a ShapeInterval object that describes this tensor’s shape more accurately. Length is included.

Example

>>> tensor.append(np.zeros((10, 10)))
>>> tensor.append(np.zeros((10, 15)))
>>> tensor.shape_interval
ShapeInterval(lower=(2, 10, 10), upper=(2, 10, 15))
>>> str(tensor.shape_interval)
(2, 10, 10:15)
Returns

Object containing lower and upper properties.

Return type

ShapeInterval

Note

If you are expecting a tuple, use shape instead.

shapes()

Get the shapes of all the samples in the tensor.

Returns

List of shapes of all the samples in the tensor.

Return type

np.ndarray

summary()

Prints a summary of the tensor.

text(fetch_chunks: bool = False)

Return text data. Only applicable for tensors with ‘text’ base htype.

property timestamps: ndarray

Returns timestamps (in seconds) for video sample as numpy array.

Example

>>> # Return timestamps for all frames of first video sample
>>> ds.videos[0].timestamps.shape
(400,)
>>> # Return timestamps for 5th to 10th frame of first video sample
>>> ds.videos[0, 5:10].timestamps
array([0.2002    , 0.23356667, 0.26693332, 0.33366665, 0.4004    ],
dtype=float32)
tobytes() bytes

Returns the bytes of the tensor.

  • Only works for a single sample of tensor.

  • If the tensor is uncompressed, this returns the bytes of the numpy array.

  • If the tensor is sample compressed, this returns the compressed bytes of the sample.

  • If the tensor is chunk compressed, this raises an error.

Returns

The bytes of the tensor.

Return type

bytes

Raises

ValueError – If the tensor has multiple samples.

property verify

Whether linked data will be verified when samples are added. Applicable only to tensors with htype link[htype].