Skip to content

Evaluating Performance

A model is only as good as its training data. After training, computing per-sample loss reveals which examples the model struggles with most. By storing those losses back in Deeplake, you can query for the worst-performing samples, inspect them, fix bad annotations, and retrain. Closing the data-centric improvement loop.

Objective

Train a Faster R-CNN object detector on the SVHN (Street View House Numbers) dataset, evaluate per-sample loss, store the results in a managed table, and query for the hardest examples.

Prerequisites

  • Deeplake SDK: pip install deeplake
  • PyTorch and torchvision: pip install torch torchvision
  • A Deeplake API token.

Set credentials first

export DEEPLAKE_API_KEY="your-token-here"
export DEEPLAKE_WORKSPACE="your-workspace"  # optional, defaults to "default"

Complete Code

import torch
from torch.utils.data import DataLoader
from torchvision.models.detection import fasterrcnn_resnet50_fpn
from deeplake import Client

# --- Configuration ---
TRAIN_TABLE = "svhn_train"
EVAL_TABLE = "svhn_eval"
NUM_EPOCHS = 3
BATCH_SIZE = 8
LEARNING_RATE = 0.005

# --- 1. Ingest the Dataset from HuggingFace ---
client = Client()
client.ingest(TRAIN_TABLE, {"_huggingface": "svhn"})

# --- 2. Create the DataLoader ---
ds = client.open_table(TRAIN_TABLE)

train_loader = DataLoader(
    ds.pytorch(),
    batch_size=BATCH_SIZE,
    shuffle=True,
    num_workers=4,
    collate_fn=lambda batch: tuple(zip(*batch)),
)

# --- 3. Define the Model ---
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model = fasterrcnn_resnet50_fpn(weights="DEFAULT")
model = model.to(device)

# --- 4. Train ---
optimizer = torch.optim.SGD(
    model.parameters(), lr=LEARNING_RATE, momentum=0.9, weight_decay=0.0005
)

for epoch in range(NUM_EPOCHS):
    model.train()
    running_loss = 0.0

    for i, batch in enumerate(train_loader):
        images = [img.to(device) for img in batch["image"]]
        targets = [
            {"boxes": t["boxes"].to(device), "labels": t["labels"].to(device)}
            for t in batch["target"]
        ]

        loss_dict = model(images, targets)
        loss = sum(loss_dict.values())

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
        if i % 50 == 49:
            print(
                f"Epoch {epoch+1}, Batch {i+1}: Loss={running_loss/50:.4f}"
            )
            running_loss = 0.0

    print(f"Epoch {epoch+1} complete.")

# --- 5. Evaluate Per-Sample Loss ---
model.train()  # Faster R-CNN returns losses only in train mode
sample_ids = []
sample_losses = []

with torch.no_grad():
    for i, batch in enumerate(train_loader):
        images = [img.to(device) for img in batch["image"]]
        targets = [
            {"boxes": t["boxes"].to(device), "labels": t["labels"].to(device)}
            for t in batch["target"]
        ]

        loss_dict = model(images, targets)
        total_loss = sum(loss_dict.values()).item()

        # Store one loss value per batch for simplicity;
        # for per-image granularity, iterate with batch_size=1.
        for j in range(len(images)):
            sample_ids.append(i * BATCH_SIZE + j)
            sample_losses.append(total_loss)

print(f"Evaluated {len(sample_ids)} samples.")

# --- 6. Store Evaluation Results ---
client.ingest(EVAL_TABLE, {"image_id": sample_ids, "loss": sample_losses})
print(f"Stored losses in '{EVAL_TABLE}' table.")

# --- 7. Find Worst-Performing Samples ---
worst = (
    client.table(EVAL_TABLE)
    .select("image_id", "loss")
    .order_by("loss DESC")
    .limit(20)()
)
print("Top 20 hardest samples:")
print(worst)

Step-by-Step Breakdown

1. Ingest the Dataset

Deeplake ingests SVHN directly from HuggingFace. The _huggingface key tells the platform to pull the dataset by name, automatically mapping its columns into a managed table.

client = Client()
client.ingest(TRAIN_TABLE, {"_huggingface": "svhn"})

If the table already exists, skip this step and go straight to open_table.

2. Create the DataLoader

Open the managed table and wrap it in a standard PyTorch DataLoader. For object detection, a custom collate_fn is needed because images and target dictionaries have variable sizes.

ds = client.open_table(TRAIN_TABLE)

train_loader = DataLoader(
    ds.pytorch(),
    batch_size=8,
    shuffle=True,
    num_workers=4,
    collate_fn=lambda batch: tuple(zip(*batch)),
)

3. Define the Model

We use fasterrcnn_resnet50_fpn from torchvision, a Faster R-CNN detector with a ResNet-50 backbone pretrained on COCO. No architecture changes are needed for SVHN since the default head already supports multi-class detection.

model = fasterrcnn_resnet50_fpn(weights="DEFAULT")
model = model.to(device)

4. Train the Model

A standard training loop with SGD. Faster R-CNN accepts a list of images and a list of target dictionaries, and returns a dictionary of losses (classification loss, box regression loss, objectness loss, RPN box loss).

optimizer = torch.optim.SGD(
    model.parameters(), lr=0.005, momentum=0.9, weight_decay=0.0005
)

for epoch in range(NUM_EPOCHS):
    model.train()
    for batch in train_loader:
        images = [img.to(device) for img in batch["image"]]
        targets = [
            {"boxes": t["boxes"].to(device), "labels": t["labels"].to(device)}
            for t in batch["target"]
        ]
        loss_dict = model(images, targets)
        loss = sum(loss_dict.values())
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

5. Evaluate Per-Sample Loss

Faster R-CNN computes losses only in model.train() mode. We run a forward pass with torch.no_grad() to avoid updating weights while still collecting loss values. Each sample's loss indicates how difficult it is for the model.

model.train()
with torch.no_grad():
    for batch in train_loader:
        loss_dict = model(images, targets)
        total_loss = sum(loss_dict.values()).item()

High-loss samples are typically images with occluded digits, noisy backgrounds, or incorrect annotations.

6. Store Evaluation Results

Ingest the per-sample losses into a new managed table. This makes the evaluation results queryable alongside the original dataset.

client.ingest(EVAL_TABLE, {"image_id": sample_ids, "loss": sample_losses})

7. Find Worst-Performing Samples

Use the fluent query API to sort by loss in descending order and retrieve the 20 hardest examples. These are the samples most likely to benefit from annotation review or augmentation.

worst = (
    client.table(EVAL_TABLE)
    .select("image_id", "loss")
    .order_by("loss DESC")
    .limit(20)()
)

You can join these IDs back to the original training table to visualize the problematic images and inspect their annotations.

What to try next