In this project, we are labelling the pixels of a road in images using a Fully Convolutional Network (FCN). FCNs can efficiently learn to dense predictions for pixel-wise tasks like semantic segmentation.
The model architecture is based on [1] which is an proven architecture (see figure below) for semantic segmentation.
A pre-trained VGG16 network was converted to a FCN by converting the final fully connected layer to a 1x1 convolution and setting the depth equal to the number of desired classes which are in our case road and not-road. Through skip connections, by performing 1x1 convolutions on previous VGG layers (layer 3, layer 4) and adding them element-wise to upsampled (transposed convolution) lower-level layers, the perfomance of the model was improved. This architecture was good enough to find free space on the road.
As an first approach the model was trained with an adam optimizer with a fix learning rate of 1e-5. The batch size was set to 16 images. The weights were initialized randomly. The network was trained for 20 epochs. In the following graph you can see how the average loss decreases over epochs.
The final approach uses a batch size of 5 images and the model was trained for 40 epochs. As we can see, the average loss is below 0.19 after 20 epochs and 0.05 after 40 epochs which is a pretty good result.
Below are a few sample images from the output of the FCN, with the segmentation class overlaid upon the original image in green. Further images can be found under runs/.
Make sure you have the following is installed:
Download the Kitti Road dataset from here. Extract the dataset in the data
folder. This will create the folder data_road
with all the training a test images.
Run the following command to run the project:
python main.py
Note If running this in Jupyter Notebook system messages, such as those regarding test status, may appear in the terminal rather than the notebook.