Skip to content

Training Object Detection Models with Deeplake and MMDetection

This tutorial shows how to train an object detection model using MMDetection with data stored in Deeplake. We'll use a YOLOv3 model trained on ImageNet data to demonstrate the workflow.

Prerequisites

First, let's install the required packages:

python -m pip install torch==1.12.0+cu116 torchvision==0.13.0+cu116 -f https://download.pytorch.org/whl/torch_stable.html
python -m pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu116/torch1.12.0/index.html

git clone  -b dev-2.x https://github.com/open-mmlab/mmdetection.git
cd mmdetection
python3 -m pip install -e .

Note: We use MMDetection 2.x versions as they're currently supported by the Deeplake integration.

Setup

Let's set up our imports and authentication:

import deeplake
from mmcv import Config
from mmdet.models import build_detector
import os
import mmcv

# Set your Deeplake token
token = os.environ["DEEPLAKE_API_KEY"]

Configuration

MMDetection uses config files to define models and training parameters. Here's our YOLOv3 config with Deeplake integration:

_base_ = "<mmdetection_path>/configs/yolo/yolov3_d53_mstrain-416_273e_coco.py"

# use caffe img_norm
img_norm_cfg = dict(mean=[0, 0, 0], std=[255., 255., 255.], to_rgb=True)

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(
        type='Expand',
        mean=img_norm_cfg['mean'],
        to_rgb=img_norm_cfg['to_rgb'],
        ratio_range=(1, 2)),
    dict(type='Resize', img_scale=[(320, 320), (416, 416)], keep_ratio=True),
    dict(type='RandomFlip', flip_ratio=0.0),
    dict(type='PhotoMetricDistortion'),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(416, 416),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip', flip_ratio=0.0),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='Pad', size_divisor=32),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img'])
        ])
]


data = dict(
    train=dict(
        pipeline=train_pipeline,
        deeplake_path="hub://activeloop/coco-train",
        # If not specified, Deeplake will auto-infer the mapping, but it might make mistakes if datasets have many tensors
        deeplake_tensors = {"img": "images", "gt_bboxes": "boxes", "gt_labels": "categories"},

        # the parameters in other parts of the cfg file such as samples_per_gpu, and others.
        deeplake_dataloader = {"shuffle": True, "batch_size": 4, 'num_workers': 8}
    ),

    # Parameters as the same as for train
    val=dict(
        pipeline=test_pipeline,
        deeplake_path="hub://activeloop/coco-val",
        deeplake_tensors = {"img": "images", "gt_bboxes": "boxes", "gt_labels": "categories"},
        deeplake_dataloader = {"shuffle": False, "batch_size": 1, 'num_workers': 8}
    ),
)


deeplake_metrics_format = "COCO"

evaluation = dict(metric=["bbox"], interval=1)

load_from = "checkpoints/yolov3_d53_mstrain-416_273e_coco-2b60fcd9.pth"

work_dir = "./mmdet_outputs"

log_config = dict(interval=10)

checkpoint_config = dict(interval=5000)

seed = None

device = "cuda"

runner = dict(type='EpochBasedRunner', max_epochs=10)

Training

Now we can start the training:

# Load config
cfg = Config.fromfile(config_path)

# Build the detector
model = build_detector(cfg.model)

# Create work directory
mmcv.mkdir_or_exist(os.path.abspath(cfg.work_dir))

# Start training
from deeplake.integrations import mmdet as mmdet_deeplake
mmdet_deeplake.train_detector(
    model, 
    cfg,
    distributed=False,  # Set to True for multi-GPU training
    validate=False      # Set to True if you have validation data
)

Key Benefits of Using Deeplake

  1. Simple Data Loading: Deeplake automatically handles data streaming and batching, so you don't need to write custom data loaders.

  2. Efficient Storage: Data is stored in an optimized format and loaded on-demand, saving disk space and memory.

  3. Easy Tensor Mapping: The deeplake_tensors config maps your dataset's tensor names to what MMDetection expects, making it easy to use any dataset.

  4. Built-in Authentication: Deeplake handles authentication and access control for your datasets securely.

  5. Distributed Training Support: The integration works seamlessly with MMDetection's distributed training capabilities.

Monitoring Training

You can monitor the training progress in the work directory:

# Check latest log file
log_file = os.path.join(cfg.work_dir, 'latest.log')
if os.path.exists(log_file):
    with open(log_file, 'r') as f:
        print(f.read())

Inference

After training, you can use the model for inference:

from mmdet.apis import inference_detector, init_detector

# Load trained model
checkpoint = os.path.join(cfg.work_dir, 'latest.pth')
model = init_detector(config_path, checkpoint)

# Load an image
img = 'path/to/test/image.jpg'

# Run inference
result = inference_detector(model, img)

Common Issues and Solutions

  1. If you get CUDA out of memory errors:

    • Reduce samples_per_gpu in the config
    • Use smaller image sizes in the pipeline
  2. If training is slow:

    • Increase num_workers in deeplake_dataloader
    • Use distributed training with multiple GPUs
  3. If you see authentication errors:

    • Make sure your Deeplake token is correct
    • Check if you have access to the dataset

Next Steps

  • Try different MMDetection models by changing the base config
  • Add validation data to monitor model performance
  • Experiment with different data augmentations in the pipeline
  • Enable distributed training for faster processing