A comprehensive YOLO (You Only Look Once) object detection implementation using dlib with support for YOLOv5 and YOLOv7 architectures. This repository provides tools for training, testing, inference, data conversion, and visualization of object detection models.
- Multiple YOLO Architectures: Support for YOLOv5, YOLOv7, and YOLOv7-tiny models
- Complete Training Pipeline: From data preparation to model evaluation
- Real-time Detection: Webcam and video processing capabilities
- Data Format Conversion: Tools for converting between XML, COCO, and Darknet formats
- Visualization Tools: Draw bounding boxes and create training plots
- GPU Acceleration: Multi-GPU training support with optimized inference
- Model Optimization: Layer fusion and model compression utilities
train
- Train YOLO models with extensive configuration optionstest
- Evaluate trained models and compute metrics (mAP, precision, recall)detect
- Run inference on images, videos, or webcam feedsfuse
- Optimize trained models by fusing layers for faster inference
xml2coco
- Convert XML datasets to COCO JSON formatcoco2xml
- Convert COCO datasets to XML formatxml2darknet
- Convert XML datasets to Darknet formatdarknet2xml
- Convert Darknet datasets to XML formatconvert_images
- Batch convert and resize images to JPEG XL (supports BMP, GIF, JPEG, PNG and WebP)
compute_anchors
- Calculate optimal anchor boxes for your datasetdraw_boxes
- Visualize bounding boxes from XML datasetsevalcoco
- Evaluate models on COCO test-dev dataset
- CMake 3.14 or higher
- C++17 compatible compiler
- dlib library
- OpenCV (for video/webcam support)
- nlohmann/json library
The project includes a convenient build script:
# Build release version (default)
./build.sh
# Build debug version
./build.sh debug
# The executables will be in build/Release/ or build/Debug/
Manual build with CMake:
mkdir build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release \
-DUSE_AVX_INSTRUCTIONS=ON \
-DUSE_SSE2_INSTRUCTIONS=ON \
-DUSE_SSE4_INSTRUCTIONS=ON
make -j$(nproc)
Create XML dataset files using dlib's format or convert from other formats:
# Convert COCO to XML
./coco2xml annotations/instances_train2017.json
# Convert Darknet to XML
./darknet2xml --names classes.names --listing train.txt
Your dataset directory should contain:
training.xml
- Training dataset metadatatesting.xml
- Testing dataset metadata- Image files referenced in the XML files
Basic training command:
./train /path/to/dataset/directory | tee training.log
Advanced training with custom parameters:
./train /path/to/dataset \
--batch-gpu 16 \
--gpus 2 \
--size 640 \
--epochs 100 \
--learning-rate 0.001 \
--mosaic 0.5 \
--mixup 0.15 | tee training.log
Evaluate the trained model:
./test /path/to/dataset/testing.xml --dnn model.dnn --conf 0.25
Detect objects in images:
# Single image
./detect --dnn model.dnn --image photo.jpg
# Directory of images
./detect --dnn model.dnn --images /path/to/images/
# Webcam (real-time)
./detect --dnn model.dnn --webcam 0
# Video file
./detect --dnn model.dnn --input video.mp4 --output processed_video.mp4
--mosaic 0.5
- Mosaic augmentation probability--mixup 0.15
- MixUp augmentation probability--mirror 0.5
- Horizontal flip probability--angle 3
- Maximum rotation in degrees--scale 0.5
- Scale variation factor--color 0.5 0.2
- Color augmentation (gamma, magnitude)
--size 512
- Input image size for training--batch-gpu 8
- Batch size per GPU--gpus 1
- Number of GPUs to use--backbone path.dnn
- Use pre-trained backbone
--epochs 100
- Total training epochs--learning-rate 0.001
- Initial learning rate--warmup 3
- Warm-up epochs--cosine
- Use cosine learning rate schedule
--dnn model.dnn
- Load trained model--sync model_sync
- Load from training checkpoint
--conf 0.25
- Confidence threshold--nms 0.45 1.0
- NMS IoU threshold and coverage ratio--size 512
- Input image size for inference
--thickness 5
- Bounding box line thickness--fill 128
- Fill boxes with transparency (0-255)--font custom.bdf
- Use custom font for labels--no-labels
- Hide class labels--no-conf
- Hide confidence scores
Standard dlib image dataset metadata format with bounding box annotations.
Convert to/from COCO JSON annotation format for compatibility with other tools.
Convert to/from Darknet format (text files with normalized coordinates).
The implementation supports multiple YOLO variants:
- YOLOv5: Efficient and accurate general-purpose detector
- YOLOv7: State-of-the-art accuracy with optimized architecture
- YOLOv7-tiny: Lightweight version for resource-constrained environments
Models are automatically configured based on your dataset and training parameters.
Training generates detailed logs and visualizations:
training.log
- Detailed training metrics (append| tee training.log
to your train command)loss.png
- Loss curves and learning rate plots (via gnuplot)- Model checkpoints saved periodically
- Best model saved based on validation mAP
Use the included plot.gp
script to generate training visualizations:
gnuplot plot.gp
This project is built on top of dlib's DNN module and follows modern C++ practices. Contributions are welcome for:
- Additional YOLO architectures
- Performance optimizations
- New data augmentation techniques
- Improved visualization tools
See LICENSE.txt
for licensing information.