deeplake.core.tensor
Tensor
- class deeplake.core.tensor.Tensor
- __len__()
Returns the length of the primary axis of the tensor. Accounts for indexing into the tensor object.
Examples
>>> len(tensor) 0 >>> tensor.extend(np.zeros((100, 10, 10))) >>> len(tensor) 100 >>> len(tensor[5:10]) 5
- Returns
The current length of this tensor.
- Return type
int
- __setitem__(item: Union[int, slice], value: Any)
Update samples with new values.
Example
>>> tensor.append(np.zeros((10, 10))) >>> tensor.shape (1, 10, 10) >>> tensor[0] = np.zeros((3, 3)) >>> tensor.shape (1, 3, 3)
- _check_compatibility_with_htype(htype)
Checks if the tensor is compatible with the given htype. Raises an error if not compatible.
- property _config
Returns a summary of the configuration of the tensor.
- _linked_sample()
Returns the linked sample at the given index. This is only applicable for tensors of
link[]
htype and can only be used for exactly one sample.>>> linked_sample = ds.abc[0]._linked_sample().path 'https://picsum.photos/200/300'
- _pop(index: List[int])
Removes elements at the given indices.
index
must be sorted in descending order.
- append(sample: Union[Sample, ndarray, int, float, bool, dict, list, str, integer, floating, bool_])
Appends a single sample (row) to the end of the tensor.
Examples
Numpy input:
>>> len(tensor) 0 >>> tensor.append(np.zeros((28, 28, 1))) >>> len(tensor) 1
File input:
>>> len(tensor) 0 >>> tensor.append(deeplake.read("path/to/file")) >>> len(tensor) 1
- Parameters
sample (InputSample) – The data to append to the tensor.
Sample
is generated bydeeplake.read()
. See the above examples.
- property base_htype
Base htype of the tensor.
Example
>>> ds.create_tensor("video_seq", htype="sequence[video]", sample_compression="mp4") >>> ds.video_seq.htype sequence[video] >>> ds.video_seq.base_htype video
- clear()
Deletes all samples from the tensor
- create_vdb_index(id: str = 'hnsw_1', distance: Union[DistanceType, str] = DistanceType.COSINE_SIMILARITY, additional_params: Optional[Dict[str, int]] = None)
Create similarity search index for embedding tensor or inverted index for text tensor.
- Parameters
id (str) – Unique identifier for the index. Defaults to
hnsw_1
. orinverted_index1
.distance (DistanceType, str) – Distance metric to be used for similarity search. Possible values are “l2_norm”, “cosine_similarity”. Defaults to
DistanceType.COSINE_SIMILARITY
.additional_params (Optional[Dict[str, int]]) –
Additional parameters for the index. - Structure of additional params is used for HNSW index:
- ”M”
Increasing this value will increase the index build time and memory usage but will improve the search accuracy. Defaults to
16
.- ”efConstruction”
Defaults to
200
.- ”partitions”
If tensors contain more than 45M samples, it is recommended to use partitions to create the index. Defaults to
1
.
- Structure of additional params is used for Inverted index:
- ”bloom_filter_size”
Size of the bloom filter. Defaults to
100000
.- ”segment_size”
Size of the segment in MB. Defaults to
25
.
Example
>>> ds = deeplake.load("./test/my_embedding_ds") >>> # create cosine_similarity index on embedding tensor >>> ds.embedding.create_vdb_index(id="hnsw_1", distance=DistanceType.COSINE_SIMILARITY) >>> # create cosine_similarity index on embedding tensor with additional params >>> ds.embedding.create_vdb_index(id="hnsw_1", distance=DistanceType.COSINE_SIMILARITY, additional_params={"M": 32, "partitions": 1, 'efConstruction': 200}) >>> # create inverted index on text tensor >>> ds.text.create_vdb_index(id="inverted_index1") >>> # create inverted index on text tensor with additional params >>> ds.text.create_vdb_index(id="inverted_index1", additional_params={"bloom_filter_size": 1000000, "segment_size": 50})
Notes
Index creation is supported for embedding tensors and text tensors.
- Raises
Exception – If the tensor is not an embedding tensor or text tensor.
- Returns
Returns the index object.
- Return type
- creds_key()
Return path data. Only applicable for linked tensors
- data(aslist: bool = False, fetch_chunks: bool = False) Any
Returns data in the tensor in a format based on the tensor’s base htype.
- If tensor has
text
base htype Returns dict with dict[“value”] =
Tensor.text()
- If tensor has
- If tensor has
json
base htype Returns dict with dict[“value”] =
Tensor.dict()
- If tensor has
- If tensor has
list
base htype Returns dict with dict[“value”] =
Tensor.list()
- If tensor has
For
video
tensors, returns a dict with keys “frames”, “timestamps” and “sample_info”:Value of dict[“frames”] will be same as
numpy()
.Value of dict[“timestamps”] will be same as
timestamps
corresponding to the frames.Value of dict[“sample_info”] will be same as
sample_info
.
For
class_label
tensors, returns a dict with keys “value” and “text”.Value of dict[“value”] will be same as
numpy()
.Value of dict[“text”] will be list of class labels as strings.
For
image
ordicom
tensors, returns dict with keys “value” and “sample_info”.Value of dict[“value”] will be same as
numpy()
.Value of dict[“sample_info”] will be same as
sample_info
.
For all else, returns dict with key “value” with value same as
numpy()
.
- dict(fetch_chunks: bool = False)
Return json data. Only applicable for tensors with ‘json’ base htype.
- property dtype: Optional[dtype]
Dtype of the tensor.
- extend(samples: Union[ndarray, Sequence[Union[Sample, ndarray, int, float, bool, dict, list, str, integer, floating, bool_]], Tensor], progressbar: bool = False, ignore_errors: bool = False)
Extends the end of the tensor by appending multiple elements from a sequence. Accepts a sequence (i.e. a list) or a single numpy array (the first axis in the array is treated as the row axis).
Example
Numpy input:
>>> len(tensor) 0 >>> tensor.extend(np.zeros((100, 28, 28, 1))) >>> len(tensor) 100
File input:
>>> len(tensor) 0 >>> tensor.extend([ deeplake.read("path/to/image1"), deeplake.read("path/to/image2"), ]) >>> len(tensor) 2
- Parameters
samples (np.ndarray, Sequence, Sequence[Sample]) – The data to add to the tensor. The length should be equal to the number of samples to add.
progressbar (bool) – Specifies whether a progressbar should be displayed while extending.
ignore_errors (bool) – Skip samples that cause errors while extending, if set to
True
.
- Raises
TensorDtypeMismatchError – Dtype for array must be equal to or castable to this tensor’s dtype.
Whether this tensor is a hidden tensor.
- property htype
Htype of the tensor.
- property info: Info
Returns the information about the tensor. User can set info of tensor.
- Returns
Information about the tensor.
- Return type
Example
>>> # update info >>> ds.images.info.update(large=True, gray=False) >>> # get info >>> ds.images.info {'large': True, 'gray': False}
>>> ds.images.info = {"complete": True} >>> ds.images.info {'complete': True}
- invalidate_libdeeplake_dataset()
Invalidates the libdeeplake dataset object.
- property is_dynamic: bool
Will return
True
if samples in this tensor have shapes that are unequal.
- property is_link
Whether this tensor is a link tensor.
- property is_sequence
Whether this tensor is a sequence tensor.
- list(fetch_chunks: bool = False)
Return list data. Only applicable for tensors with ‘list’ or ‘tag’ base htype.
- property meta
Metadata of the tensor.
- modified_samples(target_id: Optional[str] = None, return_indexes: Optional[bool] = False)
Returns a slice of the tensor with only those elements that were modified/added. By default the modifications are calculated relative to the previous commit made, but this can be changed by providing a
target id
.- Parameters
target_id (str, optional) – The commit id or branch name to calculate the modifications relative to. Defaults to
None
.return_indexes (bool, optional) – If
True
, returns the indexes of the modified elements. Defaults toFalse
.
- Returns
A new tensor with only the modified elements if
return_indexes
isFalse
. Tuple[Tensor, List[int]]: A new tensor with only the modified elements and the indexes of the modified elements ifreturn_indexes
isTrue
.- Return type
- Raises
TensorModifiedError – If a target id is passed which is not an ancestor of the current commit.
- property ndim: int
Number of dimensions of the tensor.
- property num_samples: int
Returns the length of the primary axis of the tensor. Ignores any applied indexing and returns the total length.
- numpy(aslist=False, fetch_chunks=False) Union[ndarray, List[ndarray]]
Computes the contents of the tensor in numpy format.
- Parameters
aslist (bool) – If
True
, a list of np.ndarrays will be returned. Helpful for dynamic tensors. IfFalse
, a single np.ndarray will be returned unless the samples are dynamically shaped, in which case an error is raised.fetch_chunks (bool) –
If
True
, full chunks will be retrieved from the storage, otherwise only required bytes will be retrieved. This will always beTrue
even if specified asFalse
in the following cases:The tensor is ChunkCompressed.
The chunk which is being accessed has more than 128 samples.
- Raises
DynamicTensorNumpyError – If reading a dynamically-shaped array slice without
aslist=True
.ValueError – If the tensor is a link and the credentials are not populated.
- Returns
A numpy array containing the data represented by this tensor.
Note
For tensors of htype
polygon
, aslist is alwaysTrue
.
- path(aslist: bool = True, fetch_chunks: bool = False)
Return path data. Only applicable for linked tensors.
- Parameters
aslist (bool) – Returns links in a list if
True
.fetch_chunks (bool) – If
True
, full chunks will be retrieved from the storage, otherwise only required bytes will be retrieved.
- Returns
A list or numpy array of links.
- Return type
Union[np.ndarray, List]
- Raises
Exception – If the tensor is not a linked tensor.
- play()
Play video sample. Plays video in Jupyter notebook or plays in web browser. Video is streamed directly from storage. This method will fail for incompatible htypes.
Example
>>> ds = deeplake.load("./test/my_video_ds") >>> # play second sample >>> ds.videos[2].play()
Note
Video streaming is not yet supported on colab.
- pop(index: Optional[Union[int, List[int]]] = None)
Removes element(s) at the given index / indices.
- property sample_indices
Returns all the indices pointed to by this tensor in the dataset view.
- property sample_info: Union[Dict, List[Dict]]
Returns info about particular samples in a tensor. Returns dict in case of single sample, otherwise list of dicts. Data in returned dict would depend on the tensor’s htype and the sample itself.
Example
>>> ds.videos[0].sample_info {'duration': 400400, 'fps': 29.97002997002997, 'timebase': 3.3333333333333335e-05, 'shape': [400, 360, 640, 3], 'format': 'mp4', 'filename': '../deeplake/tests/dummy_data/video/samplemp4.mp4', 'modified': False} >>> ds.images[:2].sample_info [{'exif': {'Software': 'Google'}, 'shape': [900, 900, 3], 'format': 'jpeg', 'filename': '../deeplake/tests/dummy_data/images/cat.jpeg', 'modified': False}, {'exif': {}, 'shape': [495, 750, 3], 'format': 'jpeg', 'filename': '../deeplake/tests/dummy_data/images/car.jpg', 'modified': False}]
- property shape: Tuple[Optional[int], ...]
Get the shape of this tensor. Length is included.
Example
>>> tensor.append(np.zeros((10, 10))) >>> tensor.append(np.zeros((10, 15))) >>> tensor.shape (2, 10, None)
- Returns
Tuple where each value is either
None
(if that axis is dynamic) or an int (if that axis is fixed).- Return type
tuple
Note
If you don’t want
None
in the output shape or want the lower/upper bound shapes, useshape_interval
instead.
- property shape_interval: ShapeInterval
Returns a
ShapeInterval
object that describes this tensor’s shape more accurately. Length is included.Example
>>> tensor.append(np.zeros((10, 10))) >>> tensor.append(np.zeros((10, 15))) >>> tensor.shape_interval ShapeInterval(lower=(2, 10, 10), upper=(2, 10, 15)) >>> str(tensor.shape_interval) (2, 10, 10:15)
- Returns
Object containing
lower
andupper
properties.- Return type
Note
If you are expecting a tuple, use
shape
instead.
- shapes()
Get the shapes of all the samples in the tensor.
- Returns
List of shapes of all the samples in the tensor.
- Return type
np.ndarray
- summary()
Prints a summary of the tensor.
- text(fetch_chunks: bool = False)
Return text data. Only applicable for tensors with ‘text’ base htype.
- property timestamps: ndarray
Returns timestamps (in seconds) for video sample as numpy array.
Example
>>> # Return timestamps for all frames of first video sample >>> ds.videos[0].timestamps.shape (400,) >>> # Return timestamps for 5th to 10th frame of first video sample >>> ds.videos[0, 5:10].timestamps array([0.2002 , 0.23356667, 0.26693332, 0.33366665, 0.4004 ], dtype=float32)
- tobytes() bytes
Returns the bytes of the tensor.
Only works for a single sample of tensor.
If the tensor is uncompressed, this returns the bytes of the numpy array.
If the tensor is sample compressed, this returns the compressed bytes of the sample.
If the tensor is chunk compressed, this raises an error.
- Returns
The bytes of the tensor.
- Return type
bytes
- Raises
ValueError – If the tensor has multiple samples.
- property verify
Whether linked data will be verified when samples are added. Applicable only to tensors with htype
link[htype]
.