Skip to content

GPU-Streaming Pipeline

One of Deeplake's core innovations is its ability to stream data directly from managed tables to your training loop. This eliminates the need to download large datasets locally, saving both disk space and time. Data is streamed in a GPU-friendly format, maximizing throughput.

Objective

Connect a Deeplake managed table to a PyTorch training loop and stream multimodal data directly into the model for high-performance training.

Prerequisites

  • Deeplake SDK: pip install deeplake
  • Deep Learning framework: pip install torch torchvision (or TensorFlow).
  • A Deeplake API token.

Set credentials first

export DEEPLAKE_API_KEY="your-token-here"
export DEEPLAKE_WORKSPACE="your-workspace"  # optional, defaults to "default"

Complete Code

import torch
from deeplake import Client

# 1. Setup GPU-Native Dataset
client = Client()

# 2. Open the table as a native Deeplake Dataset
# This bypasses the SQL layer and connects directly to the storage engine.
ds = client.open_table("robot_telemetry")

# 3. Create a PyTorch DataLoader
# Deeplake handles streaming and decompression in parallel.
# Wrap ds.pytorch() in a standard DataLoader for batching and shuffling.
from torch.utils.data import DataLoader

train_loader = DataLoader(
    ds.pytorch(),
    batch_size=64,
    shuffle=True,
    num_workers=8,  # Parallel CPU workers for prefetching
)

# 4. Standard PyTorch Training Loop
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Streaming data to {device}...")

for i, batch in enumerate(train_loader):
    # Tensors are streamed directly from S3/GCS into RAM/GPU buffers
    images = batch["camera_rgb"].to(device)
    angles = batch["joint_angles"].to(device)

    # training_step(images, angles)
    if i % 10 == 0:
        print(f"Batch {i}: {images.shape}")
    if i > 100: break # Demo break

Step-by-Step Breakdown

1. Bypassing SQL for Performance

While PostgreSQL is excellent for searching and filtering metadata, it is not optimized for high-throughput tensor streaming. Deeplake's client.open_table() returns a native deeplake.Dataset object that connects directly to the underlying object storage, bypassing the database layer for maximum speed.

2. Native PyTorch/TensorFlow Support

The ds.pytorch() and ds.tensorflow() methods wrap the data stream in a standard iterator compatible with common ML loaders. This means you don't have to write custom data-loading logic or handle complex multi-threading; Deeplake manages the prefetching and shuffling for you.

3. Infinite Scalability

Because data is streamed on-demand, you can train on datasets that are far larger than your local disk. Whether your dataset is 10GB or 10PB, the memory footprint on your training machine remains constant.

Why no REST API?

Streaming high-performance tensor data over standard REST endpoints introduces significant latency and CPU overhead due to HTTP headers and JSON serialization. For high-throughput training, the Python SDK is the only supported method as it uses optimized C++ streaming kernels.

What to try next