Training Object Detection Models with Deeplake and MMDetection¶

This tutorial shows how to train an object detection model using MMDetection with data stored in Deeplake. We'll use a YOLOv3 model trained on ImageNet data to demonstrate the workflow.

Prerequisites¶

First, let's install the required packages:

python -m pip install torch==1.12.0+cu116 torchvision==0.13.0+cu116 -f https://download.pytorch.org/whl/torch_stable.html
python -m pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu116/torch1.12.0/index.html

git clone  -b dev-2.x https://github.com/open-mmlab/mmdetection.git
cd mmdetection
python3 -m pip install -e .

Note: We use MMDetection 2.x versions as they're currently supported by the Deeplake integration.

Setup¶

Let's set up our imports and authentication:

import deeplake
from mmcv import Config
from mmdet.models import build_detector
import os
import mmcv

# Set your Deeplake token
token = os.environ["DEEPLAKE_API_KEY"]

Configuration¶

MMDetection uses config files to define models and training parameters. Here's our YOLOv3 config with Deeplake integration:

_base_ = "<mmdetection_path>/configs/yolo/yolov3_d53_mstrain-416_273e_coco.py"

# use caffe img_norm
img_norm_cfg = dict(mean=[0, 0, 0], std=[255., 255., 255.], to_rgb=True)

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(
        type='Expand',
        mean=img_norm_cfg['mean'],
        to_rgb=img_norm_cfg['to_rgb'],
        ratio_range=(1, 2)),
    dict(type='Resize', img_scale=[(320, 320), (416, 416)], keep_ratio=True),
    dict(type='RandomFlip', flip_ratio=0.0),
    dict(type='PhotoMetricDistortion'),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels'])
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(416, 416),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip', flip_ratio=0.0),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='Pad', size_divisor=32),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img'])
        ])
]


data = dict(
    train=dict(
        pipeline=train_pipeline,
        deeplake_path="hub://activeloop/coco-train",
        # If not specified, Deeplake will auto-infer the mapping, but it might make mistakes if datasets have many tensors
        deeplake_tensors = {"img": "images", "gt_bboxes": "boxes", "gt_labels": "categories"},

        # the parameters in other parts of the cfg file such as samples_per_gpu, and others.
        deeplake_dataloader = {"shuffle": True, "batch_size": 4, 'num_workers': 8}
    ),

    # Parameters as the same as for train
    val=dict(
        pipeline=test_pipeline,
        deeplake_path="hub://activeloop/coco-val",
        deeplake_tensors = {"img": "images", "gt_bboxes": "boxes", "gt_labels": "categories"},
        deeplake_dataloader = {"shuffle": False, "batch_size": 1, 'num_workers': 8}
    ),
)


deeplake_metrics_format = "COCO"

evaluation = dict(metric=["bbox"], interval=1)

load_from = "checkpoints/yolov3_d53_mstrain-416_273e_coco-2b60fcd9.pth"

work_dir = "./mmdet_outputs"

log_config = dict(interval=10)

checkpoint_config = dict(interval=5000)

seed = None

device = "cuda"

runner = dict(type='EpochBasedRunner', max_epochs=10)

Training¶

Now we can start the training:

# Load config
cfg = Config.fromfile(config_path)

# Build the detector
model = build_detector(cfg.model)

# Create work directory
mmcv.mkdir_or_exist(os.path.abspath(cfg.work_dir))

# Start training
from deeplake.integrations import mmdet as mmdet_deeplake
mmdet_deeplake.train_detector(
    model, 
    cfg,
    distributed=False,  # Set to True for multi-GPU training
    validate=False      # Set to True if you have validation data
)

Key Benefits of Using Deeplake¶

Simple Data Loading: Deeplake automatically handles data streaming and batching, so you don't need to write custom data loaders.
Efficient Storage: Data is stored in an optimized format and loaded on-demand, saving disk space and memory.
Easy Tensor Mapping: The deeplake_tensors config maps your dataset's tensor names to what MMDetection expects, making it easy to use any dataset.
Built-in Authentication: Deeplake handles authentication and access control for your datasets securely.
Distributed Training Support: The integration works seamlessly with MMDetection's distributed training capabilities.

Monitoring Training¶

You can monitor the training progress in the work directory:

# Check latest log file
log_file = os.path.join(cfg.work_dir, 'latest.log')
if os.path.exists(log_file):
    with open(log_file, 'r') as f:
        print(f.read())

Inference¶

After training, you can use the model for inference:

from mmdet.apis import inference_detector, init_detector

# Load trained model
checkpoint = os.path.join(cfg.work_dir, 'latest.pth')
model = init_detector(config_path, checkpoint)

# Load an image
img = 'path/to/test/image.jpg'

# Run inference
result = inference_detector(model, img)

Common Issues and Solutions¶

If you get CUDA out of memory errors:
- Reduce samples_per_gpu in the config
- Use smaller image sizes in the pipeline
If training is slow:
- Increase num_workers in deeplake_dataloader
- Use distributed training with multiple GPUs
If you see authentication errors:
- Make sure your Deeplake token is correct
- Check if you have access to the dataset

Next Steps¶

Try different MMDetection models by changing the base config
Add validation data to monitor model performance
Experiment with different data augmentations in the pipeline
Enable distributed training for faster processing