Skip to content

Getting Started

Install Deep Lake

Deep Lake is a python package that can be installed using pip.

  • pip install deeplake

Create a Dataset

import deeplake
from deeplake import schemas
import time

import numpy as np

# Create a dataset
ds = deeplake.create("file://quickstart",
                     schema=schemas.TextEmbeddings(embedding_size=768))

# Add rows of data
ds.append([
    {"id": 1,
     "date_created": int(time.time()),
     "document_id": 100,
     "document_url": "http://example.com/doc1.txt",
     "text_chunk": "Hello, World!",
     "license": "CC-BY-SA",
     "embedding": np.random.rand(768),
    },
    {"id": 2,
     "date_created": int(time.time()),
     "document_id": 101,
     "document_url": "http://example.com/doc2.txt",
     "text_chunk": "Second document",
     "license": "CC-BY-SA",
     "embedding": np.random.rand(768),
     },
])

# Add columns of data
ds.append({
    "id": [3, 4],
    "document_id": [102] * 2,
    "date_created": [int(time.time())] * 2,
    "document_url": ["http://example.com/doc3.txt"] * 2,
    "text_chunk": ["Third document", "Fourth document"],
    "license": ["CC-BY-SA",] * 2,
    "embedding": [np.random.rand(768), np.random.rand(768)],
})

# Commit the schema and data
ds.commit()

# Print a summary of the dataset
ds.summary()

Read from the Dataset

Now that your dataset is created and saved, it can be opened and read from:

ds = deeplake.open("file://quickstart")

# Print a single value by offset
print("Single value: ", ds[0]["text"])

# Iterate over a range of rows
for row in ds[1:3]:
    print("Range value:", row["text"])

# Query using TQL
result = ds.query("select * where id > 2")
for row in result:
    print("Query result:", row["text"])

# Work with the data using pytorch
torch_dl = result.pytorch()

# Work with the data using tensorflow
tf_dl = result.tensorflow()

Next Steps

Now that you have a local dataset, you can learn more about: