A deep learning project to perform semantic segmentation on aerial drone imagery using a custom U-Net architecture. This project successfully segments urban scenes into categories like roads, buildings, vegetation, trees, and cars.
My model achieves strong performance on real-world aerial data, specifically optimized to detect rare classes like trees and cars.
| Metric | Score | Description |
|---|---|---|
| Mean IoU | 47.9% | Intersection over Union (averaged) |
| F1 Score | 62.2% | Harmonic mean of precision and recall |
| Pixel Accuracy | 76.5% | Global pixel-wise accuracy |
| Class | IoU Score | Status |
|---|---|---|
| 🛣️ Roads | 76.9% | ✅ Excellent |
| 🌿 Vegetation | 67.1% | ✅ Excellent |
| 🏢 Buildings | 56.0% | ✅ Good |
| 🚗 Cars | 28.2% | 🚀 Great (up from 0%) |
| 🌳 Trees | 22.5% | 🚀 Great (up from 0.3%) |
This project uses the Semantic Drone Dataset from Kaggle.
- Source: Aerial drone photography of urban environments.
- Content: 400 high-resolution RGB images.
- Labels: Pixel-precise semantic masks (24 classes originally, simplified to 6 for this project).
I simplified the 24 original classes into 6 core categories for better training stability:
- Roads (paved areas, dirt, gravel)
- Buildings (roofs, walls, fences)
- Vegetation (grass, low vegetation)
- Trees (trees, bushes)
- Cars (cars, bicycles)
- Background (water, people, obstacles, unlabeled)
I implemented a U-Net architecture from scratch in PyTorch.
- Encoder: 4 downsampling blocks (Conv2d + BatchNorm + ReLU + MaxPool).
- Decoder: 4 upsampling blocks with skip connections to preserve spatial detail.
- Output: 1x1 Convolution to map features to 6 class probabilities.
- Input Size: 128x128 (optimized for CPU training).
To handle class imbalance (e.g., Roads are 67% of pixels, Cars are 0.2%), I implemented:
- Weighted Cross Entropy Loss: Heavily penalized missing rare classes (Cars: 50x weight, Trees: 15x weight).
- Data Augmentation: Random horizontal/vertical flips, rotations (±15°), and brightness/contrast adjustments.
- Early Stopping: Monitored validation loss with patience of 7 epochs.
-
Clone the repository
git clone https://github.com/yourusername/drone-segmentation.git cd drone-segmentation -
Install dependencies
pip install -r requirements.txt
-
Download Dataset You need a Kaggle API key (
kaggle.json).python download_kaggle_dataset.py
Alternatively, download manually from Kaggle and place in
data/kaggle_raw. -
Prepare Data Converts raw Kaggle masks to the 6-class format.
python prepare_kaggle_data.py
Train the model on your local machine (CPU/GPU).
# Train for 50 epochs with batch size 8
python train_fast.py --data data/real --epochs 50 --batch_size 8Outputs are saved to outputs/best_model_fast.pth.
Evaluate the trained model on the test set and generate visualization overlays.
python evaluate_real.py --data data/realResults saved to outputs/real_drone_report.md and outputs/overlays_real/.
.
├── drone_segmentation/ # 📦 Core Python package
│ ├── data.py # Dataset loading & augmentation
│ ├── model.py # U-Net architecture definition
│ └── utils.py # Visualization helpers
├── data/ # 💾 Data storage
│ ├── kaggle_raw/ # Original downloaded data
│ └── real/ # Processed ready-to-train data
├── outputs/ # 📤 Results
│ ├── best_model_fast.pth # Trained model weights
│ └── overlays_real/ # Segmentation visualization images
├── train_fast.py # 🚂 Training script
├── evaluate_real.py # 📊 Evaluation script
├── prepare_kaggle_data.py # 🔄 Data preprocessing script
└── requirements.txt # 📋 Dependencies
- GPU Training: Scale up to 512x512 image resolution.
- Advanced Models: Implement DeepLabV3+ or SegFormer.
- More Classes: Separate 'Water' and 'Structures' into their own classes.
Created by Muhammad Faheem Arshad