Quickstart¶
Get from zero to a working query in 2 minutes. Pick your interface:
Prerequisites¶
Get your API token from deeplake.ai under your workspace settings.
Finding your Organization ID and Workspace
- Organization ID: Go to deeplake.ai → Org Settings → look for Organization ID.
- Workspace: On deeplake.ai, workspaces are listed under your organization name. You can also create a new one with the Add Workspace button.
AI Agent Skills¶
Deeplake supports the open Agent Skills standard, compatible with 40+ AI coding agents including Claude Code, GitHub Copilot, Cursor, and Codex. Skills give your agent built-in knowledge of the Deeplake API so it can write correct queries, ingestions, and searches without looking up docs.
The installer detects your agents automatically and installs the skills in the right location.
See the Skills Reference for the full agent-friendly SDK reference.
Setup¶
Environment variables (recommended)
Store your credentials in environment variables instead of hardcoding them. This keeps secrets out of your code and version control.
export DEEPLAKE_API_KEY="your-token-here"
export DEEPLAKE_WORKSPACE="your-workspace"
export DEEPLAKE_ORG_ID="your-org-id"
The Python SDK picks up DEEPLAKE_API_KEY and DEEPLAKE_WORKSPACE automatically when you pass os.environ.get(...). The TypeScript SDK reads from process.env. The REST examples below use $DEEPLAKE_API_KEY and $DEEPLAKE_WORKSPACE directly.
Both token and workspace_id are optional if already set as environment variables.
1. Create a table and insert data¶
# Ingest structured data: schema is inferred automatically
client.ingest("my_first_table", {
"title": [
"Getting started",
"Vector search",
"Hybrid search",
],
"content": [
"Deeplake unifies tables, files, and search.",
"Use the <#> operator for similarity queries.",
"Combine BM25 and vector for best results.",
],
})
const { apiRequest } = require("deeplake/api");
const {
deeplakeOpen, deeplakeAppend, deeplakeCommit, deeplakeRelease,
} = require("deeplake/wasm");
// Create table via REST API
await apiRequest(client.apiUrl, client.token, client.orgId, {
method: "POST",
path: `/workspaces/${client.workspaceId}/tables`,
body: {
table_name: "my_first_table",
table_schema: { title: "TEXT", content: "TEXT" },
},
timeoutMs: 30000,
});
// Open dataset and write data via WASM
const dsPath = await client.getDatasetPath("my_first_table");
const ds = await deeplakeOpen(dsPath, "", client.token);
await deeplakeAppend(ds, {
title: [
"Getting started",
"Vector search",
"Hybrid search",
],
content: [
"Deeplake unifies tables, files, and search.",
"Use the <#> operator for similarity queries.",
"Combine BM25 and vector for best results.",
],
});
await deeplakeCommit(ds);
deeplakeRelease(ds);
# Create the table
curl -s -X POST "$API_URL/workspaces/$DEEPLAKE_WORKSPACE/tables/query" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $DEEPLAKE_API_KEY" \
-H "X-Activeloop-Org-Id: $DEEPLAKE_ORG_ID" \
-d '{
"query": "CREATE TABLE IF NOT EXISTS \"'$DEEPLAKE_WORKSPACE'\".\"my_first_table\" (id SERIAL PRIMARY KEY, title TEXT, content TEXT) USING deeplake"
}'
# Insert rows
curl -s -X POST "$API_URL/workspaces/$DEEPLAKE_WORKSPACE/tables/query" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $DEEPLAKE_API_KEY" \
-H "X-Activeloop-Org-Id: $DEEPLAKE_ORG_ID" \
-d '{
"query": "INSERT INTO \"'$DEEPLAKE_WORKSPACE'\".\"my_first_table\" (title, content) VALUES ($1, $2), ($3, $4), ($5, $6)",
"params": ["Getting started", "Deeplake unifies tables, files, and search.", "Vector search", "Use the <#> operator for similarity queries.", "Hybrid search", "Combine BM25 and vector for best results."]
}'
Eventual consistency
After INSERT, data may take a few seconds to become visible in SELECT queries. This is normal for Deeplake tables.
2. Query your data¶
// Fluent query API
const results = await client.table("my_first_table")
.select("title", "content")
.limit(10)
.execute();
for (const row of results) {
console.log(row.title, "-", row.content);
}
// Or use raw SQL
const rows = await client.query(
"SELECT * FROM my_first_table ORDER BY id"
);
for (const row of rows) {
console.log(row);
}
That's it. You have a table with data you can query.
3. Access the underlying dataset¶
Managed tables are backed by Deeplake datasets. Use client.open_table() to get direct access - useful for ML training, batch iteration, or working with data stored in your own cloud.
# Open a managed table as a dataset
ds = client.open_table("my_first_table")
# Direct column access
titles = ds["title"][0:10]
4. Data versioning¶
open_table() returns a deeplake.Dataset object with full version control:
ds = client.open_table("my_first_table")
# Commit and tag
ds.commit()
ds.tag("v1.0")
# View history
for version in ds.history:
print(version.id, version.timestamp)
# Branch and merge
ds.branch("experiment")
# ... add data on the branch ...
main_ds = ds.branches["main"].open()
main_ds.merge("experiment")
See Data Versioning for the full API: branches, tags, rename, delete, and read-only access.
5. Stream data for training¶
You can stream data directly from a managed table into a PyTorch or TensorFlow training loop — no local download needed:
from torch.utils.data import DataLoader
from deeplake import Client
client = Client()
ds = client.open_table("my_first_table")
loader = DataLoader(ds.pytorch(), batch_size=32, shuffle=True, num_workers=4)
for batch in loader:
# your training step here
print(batch["title"])
See Dataloaders for the full guide including TensorFlow, async iteration, and custom transforms.
What's next¶
- Tables: all CRUD operations, batch inserts, column types
- Search: vector, BM25, hybrid, and multi-vector search
- Dataloaders: PyTorch/TensorFlow DataLoaders and training pipelines
- Hybrid RAG: end-to-end RAG with vector + BM25 search
- Examples: all end-to-end projects