Collaboratively train YOLOv8 models on distributed datasets while preserving data privacy.
- 🔒 Why Federated Learning?
- 🤖 Why YOLOv8?
- ⚙️ Tech Stack
- 📥 Installation
- 📂 Dataset Preparation
- 🚀 Quick Start
- 🎛️ Simulation Setup
- 🏗️ Project Architecture
- 🛠️ Troubleshooting
- 📚 References
- 🗺️ Future Roadmap
- Data Privacy: Sensitive data (e.g., surveillance footage, vehicle sensors) stays on-device.
- Bandwidth Efficiency: Only model gradients (not raw images) are transmitted.
- Regulatory Compliance: Ideal for GDPR, HIPAA, or industry-specific data policies.
- Edge Optimization: Train models directly on edge devices (cameras, drones, IoT sensors).
- State-of-the-Art Performance: Outperforms YOLOv5 in accuracy and speed.
- Multi-Task Support: Object detection, segmentation, and classification.
- Scalability: Pre-trained models (
yolov8n
,yolov8s
, etc.) for diverse hardware. - Ease of Use: Simplified training API and extensive documentation.
- Frameworks: Flower (FL), Ultralytics YOLOv8
- Dataset: BDD100K (preprocessed and hosted on Google Drive)
- GPU Support: CUDA 11.x, NVIDIA Drivers
- Tools: Conda, Git, WSL2 (optional)
- Python 3.10+
- NVIDIA GPU with CUDA 11.8+
- Git and Conda
conda create -n fl_yolov8 python=3.10 -y
conda activate fl_yolov8
# PyTorch with CUDA
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
# YOLOv8 and Flower
pip install ultralytics flwr[simulation]
# Additional utilities
pip install opencv-python numpy tqdm pyyaml
The BDD100K dataset (already in YOLOv8 format) is hosted on Google Drive:
🔗 Download Dataset
mkdir -p federated_yolov8/data
mv ~/Downloads/bdd100k_yolov8.zip federated_yolov8/data/
cd federated_yolov8/data
unzip bdd100k_yolov8.zip
Split the dataset into client-specific subsets using split_clients.py
:
python split_clients.py \
--source="data" \
--output="data_clients" \
--num_clients=10
Each client directory requires a data.yaml
file. Example for client_0
:
train: ../client_0/train/images
val: ../client_0/val/images
nc: 13 # Number of classes
names: ["car", "person", "bus", "traffic light", ...]
python server.py --num_rounds=10 --batch_size=32
Open separate terminals for each client:
# Client 0
python client.py --id=0 --data_path="data_clients/client_0/data.yaml"
# Client 1
python client.py --id=1 --data_path="data_clients/client_1/data.yaml"
- Server Logs: Global model accuracy, round duration, client participation.
- Client Logs: Local training loss, validation mAP, GPU utilization.
Flower provides a simulation engine to test federated learning on a single machine.
flwr new my-project --framework PyTorch --username flower
cd my-project
pip install -e .
flwr run .
Use Ctrl+C in each terminal to stop the processes.
File | Purpose |
---|---|
server.py |
Flower server for aggregating client updates. |
client.py |
Flower client to train YOLOv8 locally. |
simulation.py |
Runs multiple simulated FL clients on a single machine. |
train_config.py |
Configures training hyperparameters. |
split_clients.py |
Partitions dataset into client-specific subsets. |
label_utils.py |
Converts BDD100K annotations to YOLO format. |
Issue | Solution |
---|---|
CUDA Out of Memory | Reduce BATCH_SIZE or use yolov8n . |
No GPU Detected | Verify torch.cuda.is_available() and reinstall PyTorch with CUDA. |
Dataset Path Errors | Ensure data.yaml paths match the client directory structure. |
Dependency Conflicts | Use a fresh Conda environment. |
- YOLOv8: Ultralytics Documentation
- Flower: Official Documentation
- BDD100K: Dataset Paper
- Advanced FL Strategies: Implement FedProx/FedNova for non-IID data.
- Edge Deployment: Optimize for NVIDIA Jetson/Raspberry Pi.
- Real-Time Inference: On-device inference with periodic FL updates.
- Multi-Task Learning: Add segmentation support with YOLOv8.
Thanks! 😊