Semantic segmentation is the task of clustering parts of an image together which belong to the same object class. It is a form of pixel-level prediction because each pixel in an image is classified according to a category. In this project, I have performed semantic segmentation on Dubai's Satellite Imagery Dataset by using transfer learning on a InceptionResNetV2 encoder based UNet CNN model. In order to artificially increase the amount of data and avoid overfitting, I preferred using data augmentation on the training set. The model has achieved ~81% dice coefficient and ~86% accuracy on the validation set.
The Jupyter Notebook can be accessed from here.
The pre-trained model weights can be accessed from here.
Humans in the Loop has published an open access dataset annotated for a joint project with the Mohammed Bin Rashid Space Center in Dubai, the UAE. The dataset consists of aerial imagery of Dubai obtained by MBRSC satellites and annotated with pixel-wise semantic segmentation in 6 classes. The images were segmented by the trainees of the Roia Foundation in Syria.
The images are densely labeled and contain the following 6 classes:
Name | R | G | B | Color |
---|---|---|---|---|
Building | 60 | 16 | 152 | |
Land | 132 | 41 | 246 | |
Road | 110 | 193 | 228 | |
Vegetation | 254 | 221 | 58 | |
Water | 226 | 169 | 41 | |
Unlabeled | 155 | 155 | 155 |
Albumentations is a Python library for fast and flexible image augmentations. Albumentations efficiently implements a rich variety of image transform operations that are optimized for performance, and does so while providing a concise, yet powerful image augmentation interface for different computer vision tasks, including object classification, segmentation, and detection.
There are only 72 images (having different resolutions) in the dataset, out of which I have used 56 images (~78%) for training set and remaining 16 images (~22%) for validation set. It is a very small amount of data, in order to artificially increase the amount of data and avoid overfitting, I preferred using data augmentation. By doing so I have increased the training data upto 9 times. So, the total number of images in the training set is 504 (56+448), and 16 (original) images in the validation set, after data augmentation.
Data augmentation is done by the following techniques:
- Random Cropping
- Horizontal Flipping
- Vertical Flipping
- Rotation
- Random Brightness & Contrast
- Contrast Limited Adaptive Histogram Equalization (CLAHE)
- Grid Distortion
- Optical Distortion
Here are some sample augmented images and masks from the dataset:
Source: https://arxiv.org/pdf/1602.07261v2.pdf
Source: https://arxiv.org/pdf/1505.04597.pdf
-
InceptionResNetV2 model pre-trained on the ImageNet dataset has been used as an encoder network.
-
A decoder network has been extended from the last layer of the pre-trained model, and it is concatenated to the consecutive layers.
A detailed layout of the model is available here.
- Batch Size = 16.0
- Steps per Epoch = 32.0
- Validation Steps = 4.0
- Input Shape = (512, 512, 3)
- Initial Learning Rate = 0.0001 (with Exponential Decay LearningRateScheduler callback)
- Number of Epochs = 45 (with ModelCheckpoint & EarlyStopping callback)
Model | Epochs | Train Dice Coefficient | Train Accuracy | Train Loss | Val Dice Coefficient | Val Accuracy | Val Loss |
---|---|---|---|---|---|---|---|
InceptionResNetV2-UNet | 45 (best at 34th epoch) | 0.8525 | 0.9152 | 0.2561 | 0.8112 | 0.8573 | 0.4268 |
The model_training.csv
file contain epoch wise training details of the model.
Predictions on Validation Set Images:
All predictions on the validation set are available in the predictions
directory.
Activations/Outputs of some layers of the model:
Some more activation maps are available in the activations
directory.
Code for visualizing activations is in the get_activations.py
file.
- Dataset- https://humansintheloop.org/resources/datasets/semantic-segmentation-dataset/
- C. Szegedy, S. Ioffe, V. Vanhoucke, and A. Alemi, “Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning,” arXiv.org, 23-Aug-2016. [Online]. Available: https://arxiv.org/abs/1602.07261.
- O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation,” arXiv.org, 18-May-2015. [Online]. Available: https://arxiv.org/abs/1505.04597.