Getting Started
Install Deep Lake
Deep Lake is a python package that can be installed using pip.
pip install deeplake
Create a Dataset
import deeplake
from deeplake import schemas
import time
import numpy as np
# Create a dataset
ds = deeplake.create("file://quickstart",
schema=schemas.TextEmbeddings(embedding_size=768))
# Add rows of data
ds.append([
{"id": 1,
"date_created": int(time.time()),
"document_id": 100,
"document_url": "http://example.com/doc1.txt",
"text_chunk": "Hello, World!",
"license": "CC-BY-SA",
"embedding": np.random.rand(768),
},
{"id": 2,
"date_created": int(time.time()),
"document_id": 101,
"document_url": "http://example.com/doc2.txt",
"text_chunk": "Second document",
"license": "CC-BY-SA",
"embedding": np.random.rand(768),
},
])
# Add columns of data
ds.append({
"id": [3, 4],
"document_id": [102] * 2,
"date_created": [int(time.time())] * 2,
"document_url": ["http://example.com/doc3.txt"] * 2,
"text_chunk": ["Third document", "Fourth document"],
"license": ["CC-BY-SA",] * 2,
"embedding": [np.random.rand(768), np.random.rand(768)],
})
# Commit the schema and data
ds.commit()
# Print a summary of the dataset
ds.summary()
Read from the Dataset
Now that your dataset is created and saved, it can be opened and read from:
ds = deeplake.open("file://quickstart")
# Print a single value by offset
print("Single value: ", ds[0]["text"])
# Iterate over a range of rows
for row in ds[1:3]:
print("Range value:", row["text"])
# Query using TQL
result = ds.query("select * where id > 2")
for row in result:
print("Query result:", row["text"])
# Work with the data using pytorch
torch_dl = result.pytorch()
# Work with the data using tensorflow
tf_dl = result.tensorflow()
Next Steps
Now that you have a local dataset, you can learn more about: