GPU-Streaming Pipeline¶

One of Deeplake's core innovations is its ability to stream data directly from managed tables to your training loop. This eliminates the need to download large datasets locally, saving both disk space and time. Data is streamed in a GPU-friendly format, maximizing throughput.

Objective¶

Connect a Deeplake managed table to a PyTorch training loop and stream multimodal data directly into the model for high-performance training.

Prerequisites¶

Deeplake SDK: pip install deeplake
Deep Learning framework: pip install torch torchvision (or TensorFlow).
A Deeplake API token.

Set credentials first

export DEEPLAKE_API_KEY="your-token-here"
export DEEPLAKE_WORKSPACE="your-workspace"  # optional, defaults to "default"

Complete Code¶

Python SDK

import torch
from deeplake import Client

# 1. Setup GPU-Native Dataset
client = Client()

# 2. Open the table as a native Deeplake Dataset
# This bypasses the SQL layer and connects directly to the storage engine.
ds = client.open_table("robot_telemetry")

# 3. Create a PyTorch DataLoader
# Deeplake handles streaming and decompression in parallel.
# Wrap ds.pytorch() in a standard DataLoader for batching and shuffling.
from torch.utils.data import DataLoader

train_loader = DataLoader(
    ds.pytorch(),
    batch_size=64,
    shuffle=True,
    num_workers=8,  # Parallel CPU workers for prefetching
)

# 4. Standard PyTorch Training Loop
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Streaming data to {device}...")

for i, batch in enumerate(train_loader):
    # Tensors are streamed directly from S3/GCS into RAM/GPU buffers
    images = batch["camera_rgb"].to(device)
    angles = batch["joint_angles"].to(device)

    # training_step(images, angles)
    if i % 10 == 0:
        print(f"Batch {i}: {images.shape}")
    if i > 100: break # Demo break

Step-by-Step Breakdown¶

1. Bypassing SQL for Performance¶

While PostgreSQL is excellent for searching and filtering metadata, it is not optimized for high-throughput tensor streaming. Deeplake's client.open_table() returns a native deeplake.Dataset object that connects directly to the underlying object storage, bypassing the database layer for maximum speed.

2. Native PyTorch/TensorFlow Support¶

The ds.pytorch() and ds.tensorflow() methods wrap the data stream in a standard iterator compatible with common ML loaders. This means you don't have to write custom data-loading logic or handle complex multi-threading; Deeplake manages the prefetching and shuffling for you.

3. Infinite Scalability¶

Because data is streamed on-demand, you can train on datasets that are far larger than your local disk. Whether your dataset is 10GB or 10PB, the memory footprint on your training machine remains constant.

Why no REST API?¶

Streaming high-performance tensor data over standard REST endpoints introduces significant latency and CPU overhead due to HTTP headers and JSON serialization. For high-throughput training, the Python SDK is the only supported method as it uses optimized C++ streaming kernels.

What to try next¶

Physical AI & Robotics: see how to ingest data for this pipeline.
Massive Ingestion: prepare large-scale datasets for training.
Reference: Querying: details on open_table().