This project implements a U-Net-based image segmentation model trained to extract roads from satellite images. The model is built entirely from scratch using TensorFlow and trained on the DeepGlobe Road Extraction Dataset — a real-world dataset designed for the DeepGlobe Challenge.
U-Net is a popular convolutional neural network architecture specifically designed for semantic segmentation tasks. It was originally developed for biomedical image segmentation but has been widely adopted for tasks like road extraction, building segmentation, and more.
The architecture is composed of:
- Encoder (Contracting Path): Downsamples the image using convolution + pooling to capture context.
- Decoder (Expanding Path): Upsamples the feature maps to recover spatial details and output pixel-wise predictions.
- Skip Connections: Features from the encoder are passed to the decoder to preserve fine-grained information, enabling precise segmentation.
This implementation is a custom, lightweight version of U-Net, specifically designed to run on limited hardware (CPU or single GPU laptops).
- Smaller filter sizes in the encoder-decoder blocks (starting from 32 filters instead of 64).
- Total parameters: 7.77 million, allowing for efficient local training and inference.
- Uses
BatchNormalization
andLeakyReLU
to improve convergence and robustness.
- Encoder: 4 downsampling blocks with:
- 2 × Conv2D + BatchNorm + Activation
- MaxPooling2D
- Bottleneck: A deeper conv block for high-level feature extraction.
- Decoder: 4 upsampling blocks with:
- Conv2DTranspose for upsampling
- Skip connections from encoder
- Followed by conv blocks
- Final Layer: A 1×1 convolution with sigmoid activation for binary segmentation.
To improve segmentation performance, a combination of Binary Crossentropy and Dice Loss is used. This hybrid loss balances pixel-wise accuracy with spatial overlap, addressing the class imbalance common in road extraction.
The Dice Coefficient is used to evaluate how well the predicted masks overlap with the ground truth masks. It ranges from 0 (no overlap) to 1 (perfect match), making it a highly interpretable metric for binary segmentation.
- Name: DeepGlobe Road Extraction Dataset
- Source: DeepGlobe 2018 Challenge
- Description: High-resolution satellite images with corresponding binary masks highlighting road networks.
- Epochs: 50
- Batch Size: 4
- Image Size: 256×256
- Optimizer: Adam
- Callbacks: EarlyStopping + ModelCheckpoint
- Validation Split: 20% of training data
If you have access to more powerful GPUs or cloud platforms (e.g., Google Colab, Kaggle Notebooks, or AWS), you can easily scale up the architecture by:
- Increasing the number of filters
- Adding dropout for regularization
- Switching to full-sized U-Net (starting filters at 64)
This project is licensed under the MIT License.