In this project, I trained a deep neural network to identify and track a target character in drone quad-copter simulator.
Clone the repository
$ git clone https://github.com/udacity/RoboND-DeepLearning.git
Download the data
Save the following three files into the data folder of the cloned repository.
Download the QuadSim binary
To interface your neural net with the QuadSim simulator, you must use a version QuadSim that has been custom tailored for this project. The previous version that you might have used for the Controls lab will not work.
The simulator binary can be downloaded here
Install Dependencies
You'll need Python 3 and Jupyter Notebooks installed to do this project. The best way to get setup with these if you are not already is to use Anaconda following along with the RoboND-Python-Starterkit.
If for some reason you choose not to use Anaconda, you must install the following frameworks and packages on your system:
- Python 3.x
- Tensorflow 1.2.1
- NumPy 1.11
- SciPy 0.17.0
- eventlet
- Flask
- h5py
- PIL
- python-socketio
- scikit-image
- transforms3d
- PyQt4/Pyqt5
- Download the training dataset from above and extract to the project
data
directory. - Implement your solution in model_training.ipynb
- Train the network locally, or on AWS.
- Continue to experiment with the training data and network until you attain the score you desire.
- Once you are comfortable with performance on the training dataset, see how it performs in live simulation!
A simple training dataset has been provided in this project's repository. This dataset will allow you to verify that your segmentation network is semi-functional. However, if your interested in improving your score,you may want to collect additional training data. To do it, please see the following steps.
The data directory is organized as follows:
data/runs - contains the results of prediction runs
data/train/images - contains images for the training set
data/train/masks - contains masked (labeled) images for the training set
data/validation/images - contains images for the validation set
data/validation/masks - contains masked (labeled) images for the validation set
data/weights - contains trained TensorFlow models
data/raw_sim_data/train/run1
data/raw_sim_data/validation/run1
- Run QuadSim
- Click the
DL Training
button - Set patrol points, path points, and spawn points. TODO add link to data collection doc
- With the simulator running, press "r" to begin recording.
- In the file selection menu navigate to the
data/raw_sim_data/train/run1
directory - optional to speed up data collection, press "9" (1-9 will slow down collection speed)
- When you have finished collecting data, hit "r" to stop recording.
- To reset the simulator, hit "
<esc>
" - To collect multiple runs create directories
data/raw_sim_data/train/run2
,data/raw_sim_data/train/run3
and repeat the above steps.
To collect the validation set, repeat both sets of steps above, except using the directory data/raw_sim_data/validation
instead rather than data/raw_sim_data/train
.
Before the network is trained, the images first need to be undergo a preprocessing step. The preprocessing step transforms the depth masks from the sim, into binary masks suitable for training a neural network. It also converts the images from .png to .jpeg to create a reduced sized dataset, suitable for uploading to AWS. To run preprocessing:
$ python preprocess_ims.py
With your training and validation data having been generated or downloaded from the above section of this repository, you are free to begin working with the neural net.
Note: Training CNNs is a very compute-intensive process.
Prerequisites
- Training data is in
data
directory - Validation data is in the
data
directory - The folders
data/train/images/
,data/train/masks/
,data/validation/images/
, anddata/validation/masks/
should exist and contain the appropriate data
To train complete the network definition in the model_training.ipynb
notebook and then run the training cell with appropriate hyperparameters selected.
After the training run has completed, your model will be stored in the data/weights
directory as an HDF5 file, and a configuration_weights file. As long as they are both in the same location, things should work.
Important Note the validation directory is used to store data that will be used during training to produce the plots of the loss, and help determine when the network is overfitting your data.
The sample_evalution_data directory contains data specifically designed to test the networks performance on the FollowME task. In sample_evaluation data are three directories each generated using a different sampling method. The structure of these directories is exactly the same as validation
, and train
datasets provided to you. For instance patrol_with_targ
contains an images
and masks
subdirectory. If you would like to the evaluation code on your validation
data a copy of the it should be moved into sample_evaluation_data
, and then the appropriate arguments changed to the function calls in the model_training.ipynb
notebook.
The notebook has examples of how to evaulate your model once you finish training. Think about the sourcing methods, and how the information provided in the evaluation sections relates to the final score. Then try things out that seem like they may work.
To score the network on the Follow Me task, two types of error are measured. First the intersection over the union for the pixelwise classifications is computed for the target channel.
In addition to this we determine whether the network detected the target person or not. If more then 3 pixels have probability greater then 0.5 of being the target person then this counts as the network guessing the target is in the image.
We determine whether the target is actually in the image by whether there are more then 3 pixels containing the target in the label mask.
Using the above the number of detection true_positives, false positives, false negatives are counted.
How the Final score is Calculated
The final score is the pixelwise average_IoU*(n_true_positive/(n_true_positive+n_false_positive+n_false_negative))
on data similar to that provided in sample_evaulation_data
The FCN network is used to perform inference against the pixels in an image. The FCN is composed of a series of convolution layers down to a 1x1.
The first section of the network is the encoder. The 1x1 convolution layer is not tall or wide but is deep in filters. Although the end of the encoder is 1x1 the output isn't necessarily 1x1.
The next section of the network is the decoder. It's composed of transposed convolutions that increase the height and width while shortening the depth in opposite fashion as the end of the encoder.
To solve the follow me challenge I used a 3 encoder, 1x1 convolution, 3 decoder setup.
The source code contained a series of default hyper parameters:
learning_rate = 0 batch_size = 0 num_epochs = 0 steps_per_epoch = 200 validation_steps = 50 workers = 2
I iteratively adjusted one parameter at a time in order to perform a controlled experiment to understand what postitive/negative impacts the parameter adjustments would have to the results and final model performance scoring.
- Batch_size I updated first to align with the volume of images being trained against
- Num_epochs I adjusted second through trial and error starting with 10, then 20, finally 30 where it appeared the improvement was statistically trailing off through diminishing returns
- Steps_per_epoch & validation_steps I decided to leave as is
- Workers I changed from 2 down to 1 to keep simple
- Learning_rate I knew from prior experience is very important to model performance. Initially I started with a smaller learning_rate of .00001, however as somewhat expected the val_loss didn't improve as quickly. I adjusted a few times until I reached .01
learning_rate = 0.01 batch_size = 32 num_epochs = 30 steps_per_epoch = 200 validation_steps = 50 workers = 1
The model could be improved with additional training data both following the hero in dense crowded areas, and also in larger patrol paths with the hero appearing less frequently. Tracking a dog or car could also be completed by the model but the training would need to include other animals/vehicles in an alternate environment and with different patrol patterns.