deeplake.core.tensor

Tensor

class deeplake.core.tensor.Tensor

__len__()

Returns the length of the primary axis of the tensor. Accounts for indexing into the tensor object.

Examples

>>> len(tensor)
0
>>> tensor.extend(np.zeros((100, 10, 10)))
>>> len(tensor)
100
>>> len(tensor[5:10])
5

Returns: The current length of this tensor.
Return type: int

__setitem__(item: Union[int, slice], value: Any)

Update samples with new values.

Example

>>> tensor.append(np.zeros((10, 10)))
>>> tensor.shape
(1, 10, 10)
>>> tensor[0] = np.zeros((3, 3))
>>> tensor.shape
(1, 3, 3)

_check_compatibility_with_htype(htype): Checks if the tensor is compatible with the given htype. Raises an error if not compatible.

property _config: Returns a summary of the configuration of the tensor.

_linked_sample()

Returns the linked sample at the given index. This is only applicable for tensors of link[] htype and can only be used for exactly one sample.

>>> linked_sample = ds.abc[0]._linked_sample().path
'https://picsum.photos/200/300'

_pop(index: List[int]): Removes elements at the given indices. index must be sorted in descending order.

append(sample: Union[Sample, ndarray, int, float, bool, dict, list, str, integer, floating, bool_])

Appends a single sample to the end of the tensor. Can be an array, scalar value, or the return value from deeplake.read(), which can be used to load files. See examples down below.

Examples

Numpy input:

>>> len(tensor)
0
>>> tensor.append(np.zeros((28, 28, 1)))
>>> len(tensor)
1

File input:

>>> len(tensor)
0
>>> tensor.append(deeplake.read("path/to/file"))
>>> len(tensor)
1

Parameters: sample (InputSample) – The data to append to the tensor. Sample is generated by deeplake.read(). See the above examples.

property base_htype

Base htype of the tensor.

Example

>>> ds.create_tensor("video_seq", htype="sequence[video]", sample_compression="mp4")
>>> ds.video_seq.htype
sequence[video]
>>> ds.video_seq.base_htype
video

clear(): Deletes all samples from the tensor

creds_key(): Return path data. Only applicable for linked tensors

data(aslist: bool = False, fetch_chunks: bool = False) → Any

Returns data in the tensor in a format based on the tensor’s base htype.

If tensor has text base htype
- Returns dict with dict[“value”] = Tensor.text()
If tensor has json base htype
- Returns dict with dict[“value”] = Tensor.dict()
If tensor has list base htype
- Returns dict with dict[“value”] = Tensor.list()
For video tensors, returns a dict with keys “frames”, “timestamps” and “sample_info”:
- Value of dict[“frames”] will be same as numpy().
- Value of dict[“timestamps”] will be same as timestamps corresponding to the frames.
- Value of dict[“sample_info”] will be same as sample_info.
For class_label tensors, returns a dict with keys “value” and “text”.
- Value of dict[“value”] will be same as numpy().
- Value of dict[“text”] will be list of class labels as strings.
For image or dicom tensors, returns dict with keys “value” and “sample_info”.
- Value of dict[“value”] will be same as numpy().
- Value of dict[“sample_info”] will be same as sample_info.
For all else, returns dict with key “value” with value same as numpy().

dict(fetch_chunks: bool = False): Return json data. Only applicable for tensors with ‘json’ base htype.

property dtype: Optional[dtype]: Dtype of the tensor.

extend(samples: Union[ndarray, Sequence[Union[Sample, ndarray, int, float, bool, dict, list, str, integer, floating, bool_]], Tensor], progressbar: bool = False, ignore_errors: bool = False)

Extends the end of the tensor by appending multiple elements from a sequence. Accepts a sequence, a single batched numpy array, or a sequence of deeplake.read() outputs, which can be used to load files. See examples down below.

Example

Numpy input:

>>> len(tensor)
0
>>> tensor.extend(np.zeros((100, 28, 28, 1)))
>>> len(tensor)
100

File input:

>>> len(tensor)
0
>>> tensor.extend([
        deeplake.read("path/to/image1"),
        deeplake.read("path/to/image2"),
    ])
>>> len(tensor)
2

Parameters

samples (np.ndarray, Sequence, Sequence[Sample]) – The data to add to the tensor. The length should be equal to the number of samples to add.
progressbar (bool) – Specifies whether a progressbar should be displayed while extending.
ignore_errors (bool) – Skip samples that cause errors while extending, if set to True.

Raises

TensorDtypeMismatchError – Dtype for array must be equal to or castable to this tensor’s dtype.

property hidden: bool: Whether this tensor is a hidden tensor.

property htype: Htype of the tensor.

property info: Info

Returns the information about the tensor. User can set info of tensor.

Returns: Information about the tensor.
Return type: Info

Example

>>> # update info
>>> ds.images.info.update(large=True, gray=False)
>>> # get info
>>> ds.images.info
{'large': True, 'gray': False}

>>> ds.images.info = {"complete": True}
>>> ds.images.info
{'complete': True}

invalidate_libdeeplake_dataset(): Invalidates the libdeeplake dataset object.

property is_dynamic: bool: Will return True if samples in this tensor have shapes that are unequal.

property is_link: Whether this tensor is a link tensor.

property is_sequence: Whether this tensor is a sequence tensor.

list(fetch_chunks: bool = False): Return list data. Only applicable for tensors with ‘list’ or ‘tag’ base htype.

property meta: Metadata of the tensor.

modified_samples(target_id: Optional[str] = None, return_indexes: Optional[bool] = False)

Returns a slice of the tensor with only those elements that were modified/added. By default the modifications are calculated relative to the previous commit made, but this can be changed by providing a target id.

Parameters

target_id (str, optional) – The commit id or branch name to calculate the modifications relative to. Defaults to None.
return_indexes (bool, optional) – If True, returns the indexes of the modified elements. Defaults to False.

Returns

A new tensor with only the modified elements if return_indexes is False. Tuple[Tensor, List[int]]: A new tensor with only the modified elements and the indexes of the modified elements if return_indexes is True.

Return type

Tensor

Raises

TensorModifiedError – If a target id is passed which is not an ancestor of the current commit.

property ndim: int: Number of dimensions of the tensor.

property num_samples: int: Returns the length of the primary axis of the tensor. Ignores any applied indexing and returns the total length.

numpy(aslist=False, fetch_chunks=False) → Union[ndarray, List[ndarray]]

Computes the contents of the tensor in numpy format.

Parameters

aslist (bool) – If True, a list of np.ndarrays will be returned. Helpful for dynamic tensors. If False, a single np.ndarray will be returned unless the samples are dynamically shaped, in which case an error is raised.
fetch_chunks (bool) –
If True, full chunks will be retrieved from the storage, otherwise only required bytes will be retrieved. This will always be True even if specified as False in the following cases:
- The tensor is ChunkCompressed.
- The chunk which is being accessed has more than 128 samples.

Raises

DynamicTensorNumpyError – If reading a dynamically-shaped array slice without aslist=True.
ValueError – If the tensor is a link and the credentials are not populated.

Returns

A numpy array containing the data represented by this tensor.

Note

For tensors of htype polygon, aslist is always True.

path(aslist: bool = True, fetch_chunks: bool = False)

Return path data. Only applicable for linked tensors.

Parameters

aslist (bool) – Returns links in a list if True.
fetch_chunks (bool) – If True, full chunks will be retrieved from the storage, otherwise only required bytes will be retrieved.

Returns

A list or numpy array of links.

Return type

Union[np.ndarray, List]

Raises

Exception – If the tensor is not a linked tensor.

play()

Play video sample. Plays video in Jupyter notebook or plays in web browser. Video is streamed directly from storage. This method will fail for incompatible htypes.

Example

>>> ds = deeplake.load("./test/my_video_ds")
>>> # play second sample
>>> ds.videos[2].play()

Note

Video streaming is not yet supported on colab.

pop(index: Optional[Union[int, List[int]]] = None): Removes element(s) at the given index / indices.

property sample_indices: Returns all the indices pointed to by this tensor in the dataset view.

property sample_info: Union[Dict, List[Dict]]

Returns info about particular samples in a tensor. Returns dict in case of single sample, otherwise list of dicts. Data in returned dict would depend on the tensor’s htype and the sample itself.

Example

>>> ds.videos[0].sample_info
{'duration': 400400, 'fps': 29.97002997002997, 'timebase': 3.3333333333333335e-05, 'shape': [400, 360, 640, 3], 'format': 'mp4', 'filename': '../deeplake/tests/dummy_data/video/samplemp4.mp4', 'modified': False}
>>> ds.images[:2].sample_info
[{'exif': {'Software': 'Google'}, 'shape': [900, 900, 3], 'format': 'jpeg', 'filename': '../deeplake/tests/dummy_data/images/cat.jpeg', 'modified': False}, {'exif': {}, 'shape': [495, 750, 3], 'format': 'jpeg', 'filename': '../deeplake/tests/dummy_data/images/car.jpg', 'modified': False}]

property shape: Tuple[Optional[int], ...]

Get the shape of this tensor. Length is included.

Example

>>> tensor.append(np.zeros((10, 10)))
>>> tensor.append(np.zeros((10, 15)))
>>> tensor.shape
(2, 10, None)

Returns: Tuple where each value is either None (if that axis is dynamic) or an int (if that axis is fixed).
Return type: tuple

Note

If you don’t want None in the output shape or want the lower/upper bound shapes, use shape_interval instead.

property shape_interval: ShapeInterval

Returns a ShapeInterval object that describes this tensor’s shape more accurately. Length is included.

Example

>>> tensor.append(np.zeros((10, 10)))
>>> tensor.append(np.zeros((10, 15)))
>>> tensor.shape_interval
ShapeInterval(lower=(2, 10, 10), upper=(2, 10, 15))
>>> str(tensor.shape_interval)
(2, 10, 10:15)

Returns: Object containing lower and upper properties.
Return type: ShapeInterval

Note

If you are expecting a tuple, use shape instead.

shapes()

Get the shapes of all the samples in the tensor.

Returns: List of shapes of all the samples in the tensor.
Return type: np.ndarray

summary(): Prints a summary of the tensor.

text(fetch_chunks: bool = False): Return text data. Only applicable for tensors with ‘text’ base htype.

property timestamps: ndarray

Returns timestamps (in seconds) for video sample as numpy array.

Example

>>> # Return timestamps for all frames of first video sample
>>> ds.videos[0].timestamps.shape
(400,)
>>> # Return timestamps for 5th to 10th frame of first video sample
>>> ds.videos[0, 5:10].timestamps
array([0.2002    , 0.23356667, 0.26693332, 0.33366665, 0.4004    ],
dtype=float32)

tobytes() → bytes

Returns the bytes of the tensor.

Only works for a single sample of tensor.
If the tensor is uncompressed, this returns the bytes of the numpy array.
If the tensor is sample compressed, this returns the compressed bytes of the sample.
If the tensor is chunk compressed, this raises an error.

Returns: The bytes of the tensor.
Return type: bytes
Raises: ValueError – If the tensor has multiple samples.

property verify: Whether linked data will be verified when samples are added. Applicable only to tensors with htype link[htype].