Schema APIs
deeplake.Schema
The schema of a deeplake.Dataset.
deeplake.SchemaView
A read-only view of a deeplake.Dataset deeplake.Schema.
deeplake.ColumnDefinition
deeplake.ColumnDefinitionView
A read-only view of a deeplake.ColumnDefinition
Default Schemas
COCOImages
COCOImages(
embedding_size: int,
quantize: bool = False,
objects: bool = True,
keypoints: bool = False,
stuffs: bool = False,
) -> SchemaTemplate
A schema for storing COCO-based image data.
- id (uint64)
- image (jpg image)
- url (text)
- year (uint8)
- version (text)
- description (text)
- contributor (text)
- date_created (uint64)
- date_captured (uint64)
- embedding (embedding)
- license (text)
- is_crowd (bool)
If objects
is true, the following fields are added:
- objects_bbox (bounding box)
- objects_classes (segment mask)
If keypoints
is true, the following fields are added:
- keypoints_bbox (bounding box)
- keypoints_classes (segment mask)
- keypoints (2-dimensional array of uint32)
- keypoints_skeleton (2-dimensional array of uint16)
if stuffs
is true, the following fields are added:
- stuffs_bbox (bounding boxes)
- stuffs_classes (segment mask)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
embedding_size
|
int
|
Size of the embeddings |
required |
quantize
|
bool
|
If true, quantize the embeddings to slightly decrease accuracy while greatly increasing query speed |
False
|
Examples:
SchemaTemplate
A template that can be used for creating a new dataset with deeplake.create
__init__
Constructs a new SchemaTemplate from the given dict
add
add(
name: str, dtype: DataType | str | Type
) -> SchemaTemplate
Adds a new column to the template
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
The column name |
required |
dtype
|
DataType | str | Type
|
The column data type |
required |
remove
remove(name: str) -> SchemaTemplate
Removes a column from the template
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name
|
str
|
The column name |
required |
rename
rename(old_name: str, new_name: str) -> SchemaTemplate
Renames a column in the template.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
old_name
|
str
|
Existing column name |
required |
new_name
|
str
|
New column name |
required |
TextEmbeddings
TextEmbeddings(
embedding_size: int, quantize: bool = False
) -> SchemaTemplate
A schema for storing embedded text from documents.
- id (uint64)
- chunk_index (uint16) Position of the text_chunk within the document
- document_id (uint64) Unique identifier for the document the embedding came from
- date_created (uint64) Timestamp the document was read
- text_chunk (text) The text of the shard
- embedding (dtype=float32, size=embedding_size) The embedding of the text
Parameters:
Name | Type | Description | Default |
---|---|---|---|
embedding_size
|
int
|
Size of the embeddings |
required |
quantize
|
bool
|
If true, quantize the embeddings to slightly decrease accuracy while greatly increasing query speed |
False
|
Examples:
Storage Formats
deeplake.formats.DataFormat
Base class for all datafile formats.
deeplake.formats.Chunk
Chunk(
sample_compression: str | None = None,
chunk_compression: str | None = None,
) -> DataFormat
Configures a "chunk" datafile format
Parameters:
Name | Type | Description | Default |
---|---|---|---|
sample_compression
|
str
|
How to compress individual values within the datafile |
None
|
chunk_compression
|
str
|
How to compress the datafile as a whole |
None
|