Skip to content

gregtyminski/Traffic-Sign-Classifier

Repository files navigation

Project: Build a Traffic Sign Recognition Program

This project is a part of:
Udacity - Self-Driving Car NanoDegree

Overview

In this project we are going to train a network to recognize traffic signs.
The dataset of traffic signs come from German Traffic Sign Dataset and this dataset will be used to train the neural network recognizing traffic signs.
This dataset is to large to be kept in GitHub, therefore if you want to run any training job please download the dataset from this link and unpack it to data directory.

To track results of all training experiments I use Neptune.ml tool.

Step 0: Load The Data

At the beginning we need to load the dataset. Dataset is kept in the folder data and it contains following files:\

total 311760
-rw-r--r--@ 1 grzegorz.tyminski  staff   38888118 Nov  7  2016 test.p
-rw-r--r--@ 1 grzegorz.tyminski  staff  107146452 Feb  2  2017 train.p
-rw-r--r--@ 1 grzegorz.tyminski  staff   13578712 Feb  2  2017 valid.p

Class loading this dataset is implemented in the file traffic_sign_dataset.py.
To load data you need to run following code:

from traffic_sign_dataset import TrafficData

dataset = TrafficData() # dataset is loaded from files while creating instance of the `TrafficData` object.

dataset.normalize_data() # dataset is normalized --> values are in range 0..1 instead of 0..255
dataset.shuffle_dataset() # dataset is shuffled

X_train, y_train = dataset.get_training_dataset() # get training part of dataset
X_valid, y_valid = dataset.get_validation_dataset() # get validation part of dataset
X_test, y_test = dataset.get_testing_dataset() # get testing part of dataset

Dataset preview

When the dataset is loaded, we can preview some random images from train dataset just by calling dataset.preview_random() method.
We will obtain something like:
alt text

Dataset histogram

Let's have a look on the histogram of classes in dataset. alt text We can see here, that some of the classes in dataset have 10x less pictures than the other ones. These are:

Speed limit (20km/h)
Speed limit (100km/h)
Vehicles over 3.5 metric tons prohibited
Dangerous curve to the left
Road narrows on the right
Pedestrians
Bicycles crossing
End of all speed and passing limits
Go straight or left
Keep left
End of no passing

Dataset normalization

In this example we can clearly see, that some of images are very dark. Probably too dark for humans eye to recognize the sign.
The class TrafficData contains method normalize_data() to normalize the dataset (i.e. change values of pixels from 0..255 to 0..1) as well as the method shuffle_dataset() to randomize the order of images in dataset.

Step 1: Train LeNet network

In the very first step we are going to train the same LeNet neural network on as is the MNIST example.
The LeNet-5 implementation shown in the classroom at the end of the CNN lesson is a solid starting point. Let's just verify, how good it is.
Class LeNet with model architecture is defined in the LeNet.py file.
It's input and output shape is adjusted for Traffic Sign Dataset (3-channel color with 43-classes output).
I've run grid search over hyperparams with values for epoch (10 or 15), batch_size (32, 64, 128, 256, 512, 1024) and learn_rate (0.0005, 0.001, 0.002) For several trials we got the validation accuracy up to 0.953968 for epochs equal to 10, batch_size equal to 64 and learn_rate equal to 0.002
alt text

alt text The results did not differ much for several hyperparams. However we can clearly show, that smaller batch_size resulted in better result. Values for learn_rate below 0.001 did not give satisfying results and the number of epochs bigger than 10 gave no better results (see graph below). alt text

alt text
This LeNet model has following architecture:

Variables: name type shape size
Variable:0 float32_ref 5x5x3x6 [450, bytes: 1800]
Variable_1:0 float32_ref 6 [6, bytes: 24]
Variable_2:0 float32_ref 5x5x6x16 [2400, bytes: 9600]
Variable_3:0 float32_ref 16 [16, bytes: 64]
Variable_4:0 float32_ref 400x120 [48000, bytes: 192000]
Variable_5:0 float32_ref 120 [120, bytes: 480]
Variable_6:0 float32_ref 120x84 [10080, bytes: 40320]
Variable_7:0 float32_ref 84 [84, bytes: 336]
Variable_8:0 float32_ref 84x43 [3612, bytes: 14448]
Variable_9:0 float32_ref 43 [43, bytes: 172]
Total size of variables: 64811
Total bytes of variables: 259244

Step 2: Modify LeNet network

We could modify a bit LeNet network and try it on this dataset. First modification would be to add dropout in the network. Modified LeNet is implemented in the file LeNet2.py.
I've added it in the network first 1 and later 2 dropouts and run grid search over hyperparams in both variants including also dropout value. First dropout has been added between 2nd and 3rd layer in network:

# layer 2
self.flat = flatten(self.layer2)
# dropout
self.flat = tf.nn.dropout(self.flat, self.dropout_val)
# layer 3
self.lay3_W = tf.Variable(tf.truncated_normal(shape=(400, 120), mean=self.mu, stddev=self.sigma))
self.lay3_b = tf.Variable(tf.zeros([120]))
self.layer3 = tf.matmul(self.flat, self.lay3_W) + self.lay3_b

... and the second has been added between 4th and 5th layer in network:

# layer 4
self.layer4 = tf.nn.relu(self.layer4)
# dropout
self.layer4 = tf.nn.dropout(self.layer4, self.dropout_val)
# layer 5
self.lay5_W = tf.Variable(tf.truncated_normal(shape=(84, output_classes), mean=self.mu, stddev=self.sigma))
self.lay5_b = tf.Variable(tf.zeros([output_classes]))
self.layer5 = tf.matmul(self.layer4, self.lay5_W) + self.lay5_b

Here are results from training jobs: alt text

... and when we plot the results of trainings: alt text

We have actually got worse results (no matter if single droput or double ones and the value of dropout).

Step 3: Better normalization

Let's modify the dataset. There are 2 potential improbements:

  • improve brightness
  • change color scale from 3-channel to 1-channel (grayscale)

Brightness correction as well as the change to grayscale color map is implemented in the TrafficData class. To include these 2 steps in dataset just 2 params: brighness and grayscale in dataset normalization need to be added:

# initiate and load dataset
dataset = TrafficData()

# normalize dataset --> change values of pixels from 0..255 to 0..1 & include brightness correction as well as change to grayscale color map
dataset.normalize_data(brightness=True, grayscale=True)

Normalize dataset looks as follows at the moment:
alt text

Network with dropout

As dataset input shape has changed, we had to change the network architecture to adapt this input shape. New version of the network including dropout has been implemented in LeNet3.py.
Training network with dropout gave no result improvement (0.904535):' alt text

... and when we plot the results of trainings: alt text

Network without dropout

Training network without dropout (LeNet4.py) gave similar best result (0.944898) on validation dataset as in dataset without brithness improvements and grayscaled, but the worst result was significantly better (0.903855) than previously:' alt text

... and when we plot the results of trainings: alt text

Step 4: Improve dataset

If we have a look on the dataset classes distribution (part Step 0: Load The Data), we clearly see, that distribution of classes is not equal with big differences. The dataset requires improvement --> augmentation.
Very simple augmentation is implemented in the TrafficData class. To double the train dataset we need to just call the method augment_dataset() after loading it and before normalization and before shuffling.'

from traffic_sign_dataset import TrafficData

# initiate and load dataset
dataset = TrafficData()
# augment dataset to get 2x bigger dataset
dataset.augment_dataset()
# augment dataset again to get 4x bigger dataset
dataset.augment_dataset()

# normalize dataset --> change values of pixels from 0..255 to 0..1
dataset.normalize_data(brightness=True, grayscale=True)
# randomize the order of images in dataset
dataset.shuffle_dataset()

The library Albumentations is used for data augmentation. Single augmentation step dataset.agument_dataset() just adds 1 new image for each already existing in training dataset which is slightly rotated, slightly shifted and slightly brightness modified. As a result 2x bigger dataset is received. As we call it twice, we get 4x bigger dataset with modified images.

When we run training job (LeNet4 network with grayscaled images and without dropout) we have received much better accuracy result 0.971429
alt text

... and when we plot the results of trainings: alt text

The result on test dataset for this training was: 0.950119
alt text

Test performance of model on random images from internet

Last step is to verify the model on random images of german signs downloaded from internet. These pictures are:

ID Image Top 3 probabilities Corresponding labels
1. alt text prob=0.80
prob=0.12
prob=0.03
Speed limit (30km/h)
Roundabout mandatory
Speed limit (70km/h)
2. alt text prob=1.00
prob=0.00
prob=0.00
Turn right ahead
Ahead only
Right-of-way at the next intersection
3. alt text prob=0.99
prob=0.01
prob=0.00
Beware of ice/snow
Right-of-way at the next intersection
Bicycles crossing
4. alt text prob=0.73
prob=0.25
prob=0.03
Speed limit (80km/h)
Speed limit (50km/h)
Speed limit (60km/h)
5. alt text prob=0.92
prob=0.04
prob=0.02
Speed limit (70km/h)
Roundabout mandatory
General caution
6. alt text prob=0.99
prob=0.01
prob=0.00
Road work
Children crossing
Right-of-way at the next intersection
7. alt text prob=0.60
prob=0.19
prob=0.15
Speed limit (100km/h)
Vehicles over 3.5 metric tons prohibited
End of speed limit (80km/h)
8. alt text prob=1.00
prob=0.00
prob=0.00
Stop
Speed limit (30km/h)
Speed limit (60km/h)
9. alt text prob=1.00
prob=0.00
prob=0.00
Yield
Priority road
No vehicles
10. alt text prob=0.89
prob=0.11
prob=0.00
Speed limit (50km/h)
Speed limit (80km/h)
Speed limit (30km/h)

Some explanation of the results:

  1. This is actually traffic sign forbidding parking and stopping the car. This sign was not included in training dataset. Therefore model could not recognize it correctly. It's unknown sign for the model.
  2. Perfect classification. No comment needed.
  3. Perfect classification. No comment needed.
  4. Again, this traffic sign is not included to training dataset. Model could not classify correctly. This is just similar to a speed limit sign.
  5. Again here, there is no such a traffic sign in training dataset. However model classified this very closely to speed limit to 70.
  6. Perfect classification. No comment needed.
  7. This traffic sign should be classified as Vehicles over 3.5 metric tons prohibited. It was classified on 2nd position with probability 19%. Problably this mistake is caused by very little amount of pictures for this class in training dataset.
  8. Perfect classification. No comment needed.
  9. Perfect classification. No comment needed.
  10. Very good classification with quite high probability (89%).

Potential improvements

Following further steps might improve model:

  • Add batch normalization in the neural network. This step usually significantly improve model performance.
  • Use another activation function, that generates better results. As written in This arxiv paper, Mish: A Self Regularized Non-Monotonic Neural Activation Function provides better results.
  • Improve dataset. Training dataset clearly shows lots of blurry images (e.g. in speed limit traffic signs) where the numbers are even difficult to be read by human, as well as too dark images or too exposed images, where simple "lightness correction" normalization does not bring good results.

About

Traffic Sign Classifier

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published