The goal of this project is to learn the basic concepts and techniques to build deep neural networks to detect, segment and recognize specific objects, focusing on the self-driving car application. With the aim to solve the problem of automatic image understanding, the tasks performed include object recognition, detection and semantic segmentation in images recorded by an on-board vehicle camera.
- Daniel Azemar (daniel.azemar@e-campus.uab.cat)
- María Gil Aragones (maria.gilaragones@gmail.com)
- Laura Mora Ballestar (lmoraballestar@gmail.com)
- Richard Segovia (richard.segovia@e-campus.uab.cat)
This repository creates a PyTorch based framework to achieve three goals:
Environment Set Up:
- Python 3.7
- Pytorch -- cudatoolkit, torchvision
pip install -r requirements.txt
# --exp_name: directory where results are stored
# --config_dile: file where the configuration for code is set up
python3 main.py --exp_name dir_name --exp_folder ./ --config_file config/configFile.yml
In order to execute the framework for object detection, different steps have to be followed. First, see source repository
1. Prerequisits
- Python 3.6
- Pytorch 1.0
- Cuda 8 or hihger
2. Data preparation
The framework requires COCO and PASCAL to be installed in order to work properly
-
PASCAL_VOC 07+12: Please follow the instructions in py-faster-rcnn to prepare VOC datasets. After downloading the data, create softlinks in the folder
object_detection/faster-rcnn.pytorch/data/
. -
COCO: Download from the respository COCOAPI and store in folder
object_detection/faster-rcnn.pytorch/data/
-
UDACITY and other nonVoc Datasets
- First make a folder inside of the data folder with the name of the dataset.
- Create a folder called annotations_cache
- Create a folder called results
- Create a folder called nameOfDatasetYear
- Inside the nameOfDatasetYear folder, create the following structure:
/Annotations /ImageSets/Layout /ImageSets/Main /ImageSets/Segmentation /JPEGImages /test /train /valid
- Copy the images and the txt files of the dataset to the test, train and valid folders.
- Copy all the images to the JPEGImages folder
- Copy the convert_to_voc.py file to the
/nameofDataset/nameOfDatasetYear
and execute it with python - Clone
/lib/datasets/pascal_voc.py
and make the modifications to adapt it to your dataset - Go to
/lib/datasets/factory.py
and add the cll to your clone of the/lib/datasets/pascal_voc.py
- Add the name of dataset to the options in the
/test_net.py
and/trainval_net.py
3. Pretrained Models
The framework uses VGG16 or Restnet101 as baseline architectures. The weights of the networks, trained with Caffe, must be stored in the folder object_detection/framework/pretrained_models/
Link to download the models from the source repository:
4. Compilation
pip install -r requirements.txt
cd lib
python setup.py build develop
Train
LEARNING_RATE=lr
BATCH_SIZE=batchSize
DECAY_STEP=decayStep
DATASET=udacity_voc #udacity_voc or pascal_voc
NETWORK=res101 #res101 or vgg16
EPOCHS=numberEpochs
python3 trainval_net.py --dataset $DATASET --net $NETWORK \
--bs $BATCH_SIZE --nw 1 \
--lr $LEARNING_RATE --lr_decay_step $DECAY_STEP \
--cuda --mGPUs --epochs $EPOCHS
Test
python3 test_net.py --dataset $DATASET --net $NETWORK \
--cuda --mGPUs --checksession $CHECK_SESSION --checkepoch $CHECK_EPOCH --checkpoint $CHECK_MODEL
Demo
Script which loads the trained model and saves the result image detection in the folder object_detection/framework/images/
python demo.py --net res101 \
--checksession $SESSION --checkepoch $EPOCH --checkpoint $CHECKPOINT --cuda --load_dir models/
Object Recognition | Semantic Segmentation | Object Detection |
---|---|---|
Presentation | Presentation | Presentation |