Skip to content

Quickstart

Get from zero to a working query in 2 minutes. Pick your interface:

Prerequisites

pip install deeplake
npm install deeplake

No installation required. You only need curl and a terminal.

Get your API token from deeplake.ai under your workspace settings.

Finding your Organization ID and Workspace

  • Organization ID: Go to deeplake.aiOrg Settings → look for Organization ID.
  • Workspace: On deeplake.ai, workspaces are listed under your organization name. You can also create a new one with the Add Workspace button.

AI Agent Skills

Deeplake supports the open Agent Skills standard, compatible with 40+ AI coding agents including Claude Code, GitHub Copilot, Cursor, and Codex. Skills give your agent built-in knowledge of the Deeplake API so it can write correct queries, ingestions, and searches without looking up docs.

npx skills add activeloopai/deeplake-skills

The installer detects your agents automatically and installs the skills in the right location.

See the Skills Reference for the full agent-friendly SDK reference.

Setup

import os
from deeplake import Client

client = Client(
    token=os.environ.get("DEEPLAKE_API_KEY"),
    workspace_id=os.environ.get("DEEPLAKE_WORKSPACE"),
)
const { ManagedClient } = require("deeplake");
const { initializeWasm } = require("deeplake/wasm");

await initializeWasm();

const client = new ManagedClient({
    token: process.env.DEEPLAKE_API_KEY,
    workspaceId: process.env.DEEPLAKE_WORKSPACE,
});
await client.applyStorageCreds("readwrite");
# Set these once in your shell profile or .env file
export DEEPLAKE_API_KEY="your-token-here"       # from deeplake.ai
export DEEPLAKE_WORKSPACE="your-workspace"
export DEEPLAKE_ORG_ID="your-org-id"

# Then use them in all API calls
API_URL="https://api.deeplake.ai"

Environment variables (recommended)

Store your credentials in environment variables instead of hardcoding them. This keeps secrets out of your code and version control.

export DEEPLAKE_API_KEY="your-token-here"
export DEEPLAKE_WORKSPACE="your-workspace"
export DEEPLAKE_ORG_ID="your-org-id"

The Python SDK picks up DEEPLAKE_API_KEY and DEEPLAKE_WORKSPACE automatically when you pass os.environ.get(...). The TypeScript SDK reads from process.env. The REST examples below use $DEEPLAKE_API_KEY and $DEEPLAKE_WORKSPACE directly.

Both token and workspace_id are optional if already set as environment variables.

1. Create a table and insert data

# Ingest structured data: schema is inferred automatically
client.ingest("my_first_table", {
    "title": [
        "Getting started",
        "Vector search",
        "Hybrid search",
    ],
    "content": [
        "Deeplake unifies tables, files, and search.",
        "Use the <#> operator for similarity queries.",
        "Combine BM25 and vector for best results.",
    ],
})
const { apiRequest } = require("deeplake/api");
const {
    deeplakeOpen, deeplakeAppend, deeplakeCommit, deeplakeRelease,
} = require("deeplake/wasm");

// Create table via REST API
await apiRequest(client.apiUrl, client.token, client.orgId, {
    method: "POST",
    path: `/workspaces/${client.workspaceId}/tables`,
    body: {
        table_name: "my_first_table",
        table_schema: { title: "TEXT", content: "TEXT" },
    },
    timeoutMs: 30000,
});

// Open dataset and write data via WASM
const dsPath = await client.getDatasetPath("my_first_table");
const ds = await deeplakeOpen(dsPath, "", client.token);

await deeplakeAppend(ds, {
    title: [
        "Getting started",
        "Vector search",
        "Hybrid search",
    ],
    content: [
        "Deeplake unifies tables, files, and search.",
        "Use the <#> operator for similarity queries.",
        "Combine BM25 and vector for best results.",
    ],
});
await deeplakeCommit(ds);
deeplakeRelease(ds);
# Create the table
curl -s -X POST "$API_URL/workspaces/$DEEPLAKE_WORKSPACE/tables/query" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $DEEPLAKE_API_KEY" \
  -H "X-Activeloop-Org-Id: $DEEPLAKE_ORG_ID" \
  -d '{
    "query": "CREATE TABLE IF NOT EXISTS \"'$DEEPLAKE_WORKSPACE'\".\"my_first_table\" (id SERIAL PRIMARY KEY, title TEXT, content TEXT) USING deeplake"
  }'

# Insert rows
curl -s -X POST "$API_URL/workspaces/$DEEPLAKE_WORKSPACE/tables/query" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $DEEPLAKE_API_KEY" \
  -H "X-Activeloop-Org-Id: $DEEPLAKE_ORG_ID" \
  -d '{
    "query": "INSERT INTO \"'$DEEPLAKE_WORKSPACE'\".\"my_first_table\" (title, content) VALUES ($1, $2), ($3, $4), ($5, $6)",
    "params": ["Getting started", "Deeplake unifies tables, files, and search.", "Vector search", "Use the <#> operator for similarity queries.", "Hybrid search", "Combine BM25 and vector for best results."]
  }'

Eventual consistency

After INSERT, data may take a few seconds to become visible in SELECT queries. This is normal for Deeplake tables.

2. Query your data

# Fluent query API
results = (
    client.table("my_first_table")
        .select("title", "content")
        .limit(10)
)()

for row in results:
    print(row["title"], "-", row["content"])

# Or use raw SQL
rows = client.query("SELECT * FROM my_first_table")
for row in rows:
    print(row)
// Fluent query API
const results = await client.table("my_first_table")
    .select("title", "content")
    .limit(10)
    .execute();

for (const row of results) {
    console.log(row.title, "-", row.content);
}

// Or use raw SQL
const rows = await client.query(
    "SELECT * FROM my_first_table ORDER BY id"
);
for (const row of rows) {
    console.log(row);
}
curl -s -X POST "$API_URL/workspaces/$DEEPLAKE_WORKSPACE/tables/query" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $DEEPLAKE_API_KEY" \
  -H "X-Activeloop-Org-Id: $DEEPLAKE_ORG_ID" \
  -d '{"query": "SELECT * FROM \"'$DEEPLAKE_WORKSPACE'\".\"my_first_table\" ORDER BY id"}'

That's it. You have a table with data you can query.

3. Access the underlying dataset

Managed tables are backed by Deeplake datasets. Use client.open_table() to get direct access - useful for ML training, batch iteration, or working with data stored in your own cloud.

# Open a managed table as a dataset
ds = client.open_table("my_first_table")

# Direct column access
titles = ds["title"][0:10]

4. Data versioning

open_table() returns a deeplake.Dataset object with full version control:

ds = client.open_table("my_first_table")

# Commit and tag
ds.commit()
ds.tag("v1.0")

# View history
for version in ds.history:
    print(version.id, version.timestamp)

# Branch and merge
ds.branch("experiment")
# ... add data on the branch ...
main_ds = ds.branches["main"].open()
main_ds.merge("experiment")

See Data Versioning for the full API: branches, tags, rename, delete, and read-only access.

5. Stream data for training

You can stream data directly from a managed table into a PyTorch or TensorFlow training loop — no local download needed:

from torch.utils.data import DataLoader
from deeplake import Client

client = Client()
ds = client.open_table("my_first_table")

loader = DataLoader(ds.pytorch(), batch_size=32, shuffle=True, num_workers=4)

for batch in loader:
    # your training step here
    print(batch["title"])

See Dataloaders for the full guide including TensorFlow, async iteration, and custom transforms.

What's next

  • Tables: all CRUD operations, batch inserts, column types
  • Search: vector, BM25, hybrid, and multi-vector search
  • Dataloaders: PyTorch/TensorFlow DataLoaders and training pipelines
  • Hybrid RAG: end-to-end RAG with vector + BM25 search
  • Examples: all end-to-end projects