In this project we are going to train a network to recognize traffic signs.
The dataset of traffic signs come from German Traffic Sign Dataset and this dataset will be used to train the neural network recognizing traffic signs.
This dataset is to large to be kept in GitHub, therefore if you want to run any training job please download the dataset from this link and unpack it to data
directory.
To track results of all training experiments I use Neptune.ml tool.
At the beginning we need to load the dataset. Dataset is kept in the folder data
and it contains following files:\
total 311760
-rw-r--r--@ 1 grzegorz.tyminski staff 38888118 Nov 7 2016 test.p
-rw-r--r--@ 1 grzegorz.tyminski staff 107146452 Feb 2 2017 train.p
-rw-r--r--@ 1 grzegorz.tyminski staff 13578712 Feb 2 2017 valid.p
Class loading this dataset is implemented in the file traffic_sign_dataset.py.
To load data you need to run following code:
from traffic_sign_dataset import TrafficData
dataset = TrafficData() # dataset is loaded from files while creating instance of the `TrafficData` object.
dataset.normalize_data() # dataset is normalized --> values are in range 0..1 instead of 0..255
dataset.shuffle_dataset() # dataset is shuffled
X_train, y_train = dataset.get_training_dataset() # get training part of dataset
X_valid, y_valid = dataset.get_validation_dataset() # get validation part of dataset
X_test, y_test = dataset.get_testing_dataset() # get testing part of dataset
When the dataset is loaded, we can preview some random images from train
dataset just by calling dataset.preview_random()
method.
We will obtain something like:
Let's have a look on the histogram of classes in dataset. We can see here, that some of the classes in dataset have 10x less pictures than the other ones. These are:
Speed limit (20km/h)
Speed limit (100km/h)
Vehicles over 3.5 metric tons prohibited
Dangerous curve to the left
Road narrows on the right
Pedestrians
Bicycles crossing
End of all speed and passing limits
Go straight or left
Keep left
End of no passing
In this example we can clearly see, that some of images are very dark. Probably too dark for humans eye to recognize the sign.
The class TrafficData
contains method normalize_data()
to normalize the dataset (i.e. change values of pixels from 0..255 to 0..1) as well as the method shuffle_dataset()
to randomize the order of images in dataset.
In the very first step we are going to train the same LeNet neural network on as is the MNIST example.
The LeNet-5 implementation shown in the classroom at the end of the CNN lesson is a solid starting point.
Let's just verify, how good it is.
Class LeNet
with model architecture is defined in the LeNet.py file.
It's input and output shape is adjusted for Traffic Sign Dataset (3-channel color with 43-classes output).
I've run grid search over hyperparams with values for epoch
(10 or 15), batch_size
(32, 64, 128, 256, 512, 1024) and learn_rate
(0.0005, 0.001, 0.002)
For several trials we got the validation accuracy up to 0.953968 for epochs
equal to 10, batch_size
equal to 64 and learn_rate
equal to 0.002
The results did not differ much for several hyperparams. However we can clearly show, that smaller batch_size
resulted in better result. Values for learn_rate
below 0.001 did not give satisfying results and the number of epochs
bigger than 10 gave no better results (see graph below).
This LeNet
model has following architecture:
Variables: | name | type shape | size |
---|---|---|---|
Variable:0 | float32_ref | 5x5x3x6 | [450, bytes: 1800] |
Variable_1:0 | float32_ref | 6 | [6, bytes: 24] |
Variable_2:0 | float32_ref | 5x5x6x16 | [2400, bytes: 9600] |
Variable_3:0 | float32_ref | 16 | [16, bytes: 64] |
Variable_4:0 | float32_ref | 400x120 | [48000, bytes: 192000] |
Variable_5:0 | float32_ref | 120 | [120, bytes: 480] |
Variable_6:0 | float32_ref | 120x84 | [10080, bytes: 40320] |
Variable_7:0 | float32_ref | 84 | [84, bytes: 336] |
Variable_8:0 | float32_ref | 84x43 | [3612, bytes: 14448] |
Variable_9:0 | float32_ref | 43 | [43, bytes: 172] |
Total size of variables: 64811 |
|||
Total bytes of variables: 259244 |
We could modify a bit LeNet
network and try it on this dataset. First modification would be to add dropout
in the network. Modified LeNet
is implemented in the file LeNet2.py
.
I've added it in the network first 1 and later 2 dropouts and run grid search over hyperparams in both variants including also dropout value.
First dropout has been added between 2nd and 3rd layer in network:
# layer 2
self.flat = flatten(self.layer2)
# dropout
self.flat = tf.nn.dropout(self.flat, self.dropout_val)
# layer 3
self.lay3_W = tf.Variable(tf.truncated_normal(shape=(400, 120), mean=self.mu, stddev=self.sigma))
self.lay3_b = tf.Variable(tf.zeros([120]))
self.layer3 = tf.matmul(self.flat, self.lay3_W) + self.lay3_b
... and the second has been added between 4th and 5th layer in network:
# layer 4
self.layer4 = tf.nn.relu(self.layer4)
# dropout
self.layer4 = tf.nn.dropout(self.layer4, self.dropout_val)
# layer 5
self.lay5_W = tf.Variable(tf.truncated_normal(shape=(84, output_classes), mean=self.mu, stddev=self.sigma))
self.lay5_b = tf.Variable(tf.zeros([output_classes]))
self.layer5 = tf.matmul(self.layer4, self.lay5_W) + self.lay5_b
Here are results from training jobs:
... and when we plot the results of trainings:
We have actually got worse results (no matter if single droput or double ones and the value of dropout).
Let's modify the dataset. There are 2 potential improbements:
- improve brightness
- change color scale from 3-channel to 1-channel (grayscale)
Brightness correction as well as the change to grayscale color map is implemented in the TrafficData
class. To include these 2 steps in dataset just 2 params: brighness
and grayscale
in dataset normalization need to be added:
# initiate and load dataset
dataset = TrafficData()
# normalize dataset --> change values of pixels from 0..255 to 0..1 & include brightness correction as well as change to grayscale color map
dataset.normalize_data(brightness=True, grayscale=True)
Normalize dataset looks as follows at the moment:
As dataset input shape has changed, we had to change the network architecture to adapt this input shape. New version of the network including dropout has been implemented in LeNet3.py
.
Training network with dropout gave no result improvement (0.904535):'
... and when we plot the results of trainings:
Training network without dropout (LeNet4.py
) gave similar best result (0.944898) on validation dataset as in dataset without brithness improvements and grayscaled, but the worst result was significantly better (0.903855) than previously:'
... and when we plot the results of trainings:
If we have a look on the dataset classes distribution (part Step 0: Load The Data
), we clearly see, that distribution of classes is not equal with big differences. The dataset requires improvement --> augmentation.
Very simple augmentation is implemented in the TrafficData
class. To double the train dataset we need to just call the method augment_dataset()
after loading it and before normalization and before shuffling.'
from traffic_sign_dataset import TrafficData
# initiate and load dataset
dataset = TrafficData()
# augment dataset to get 2x bigger dataset
dataset.augment_dataset()
# augment dataset again to get 4x bigger dataset
dataset.augment_dataset()
# normalize dataset --> change values of pixels from 0..255 to 0..1
dataset.normalize_data(brightness=True, grayscale=True)
# randomize the order of images in dataset
dataset.shuffle_dataset()
The library Albumentations is used for data augmentation. Single augmentation step dataset.agument_dataset()
just adds 1 new image for each already existing in training dataset which is slightly rotated, slightly shifted and slightly brightness modified. As a result 2x bigger dataset is received. As we call it twice, we get 4x bigger dataset with modified images.
When we run training job (LeNet4
network with grayscaled images and without dropout) we have received much better accuracy result 0.971429
... and when we plot the results of trainings:
The result on test
dataset for this training was: 0.950119
Last step is to verify the model on random images of german signs downloaded from internet. These pictures are:
Some explanation of the results:
- This is actually traffic sign forbidding parking and stopping the car. This sign was not included in training dataset. Therefore model could not recognize it correctly. It's unknown sign for the model.
- Perfect classification. No comment needed.
- Perfect classification. No comment needed.
- Again, this traffic sign is not included to training dataset. Model could not classify correctly. This is just similar to a speed limit sign.
- Again here, there is no such a traffic sign in training dataset. However model classified this very closely to speed limit to 70.
- Perfect classification. No comment needed.
- This traffic sign should be classified as Vehicles over 3.5 metric tons prohibited. It was classified on 2nd position with probability 19%. Problably this mistake is caused by very little amount of pictures for this class in training dataset.
- Perfect classification. No comment needed.
- Perfect classification. No comment needed.
- Very good classification with quite high probability (89%).
Following further steps might improve model:
- Add batch normalization in the neural network. This step usually significantly improve model performance.
- Use another activation function, that generates better results. As written in This arxiv paper, Mish: A Self Regularized Non-Monotonic Neural Activation Function provides better results.
- Improve dataset. Training dataset clearly shows lots of blurry images (e.g. in speed limit traffic signs) where the numbers are even difficult to be read by human, as well as too dark images or too exposed images, where simple "lightness correction" normalization does not bring good results.