Scalable image classification training using PyTorch Distributed Data Parallel (DDP). Works on Windows (CPU/GPU) and Linux/macOS. Includes TensorBoard logs and a simple results CSV for speed/accuracy tracking.
- DDP launcher for multi-GPU (and CPU fallback with
gloobackend) - Minimal CNN model + CIFAR-10 loader
- Mixed precision (if CUDA available)
- TensorBoard + CSV logging under
benchmarks/ - Windows
launch_local.batand Linux/Maclaunch_local.sh
Windows (CMD):
python -m venv .venv
.venv\Scripts\activate
pip install -r requirements.txtLinux/Mac:
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txtpython scripts/train_ddp.py --epochs 2 --batch-size 128Windows (CMD):
scripts\launch_local.batLinux/Mac:
bash scripts/launch_local.shBy default, the launch scripts use 2 processes. Change
NUM_PROCSin the script to match your GPUs.
tensorboard --logdir benchmarks/logs- Final model:
benchmarks/model_final.pt - CSV metrics:
benchmarks/results.csv
-
Create a repo, set up venv, and install deps:
python -m venv .venv && source .venv/bin/activate pip install torch torchvision tensorboard numpy
-
Add a model (
src/model.py), dataset (src/dataset.py), and DDP script (scripts/train_ddp.py). -
Launch DDP with:
python -m torch.distributed.run --nproc_per_node=2 scripts/train_ddp.py --epochs 2
-
Log metrics (TensorBoard SummaryWriter + CSV).
-
Save artifacts under
benchmarks/and push.