Repository for private project Duckietown at BME VIK (Budapesti Műszaki és Gazdaságtudományi Egyetem Villamosmérnöki és Informatikai Kar).
All used packages are listed in requirements.txt. Note however, that this module may work with other setup as well.
All the code related to neural networks are implemented in PyTorch and, recently, in PyTorch Lightning. Versions are always updated, so I plan to use the newest version possible. If this repo is finalised, or I wish to make a tag of it then I will add specific version information.
All data generation code was run on Ubuntu 20 and Python3.8, while all training code was run on Ubuntu 16 and Python3.7 on a DGX Station. For package requirements please see requirements.txt
Data can be manually generated from the modified simulator contained in the rightLaneDatagen folder. It needs a valid setup of the modified simulator:
cd rightLaneDatagen
pip install -e .
To use it one needs to launch manual_control.py with a selected map from rightLaneDatagen/gym_duckietown/maps.
python3 manual_control.py --env-name Duckietown-udem1-v0 --map-name loop_dyn_duckiebots --domain-rand --distortion
Key bindings:
- Press 'A' to change annotated lane (right/left/none)
- Press 'Enter' to start recording (note: annotated lane should be selected first)
- Press 'Enter' again to stop the recording
- Press 'Backspace' in order to random reset the environment
- Press 'Q' to quit
Additional to selecting a map one may specify to use domain randomization with the domain_rand flag set to True. Another flag worthy of note is distortion that allows camera distortion in domain randomization.
Saving of recorded videos is done via a background thread. In case of annotation mode change or any reset condition met recording stops automatically. Recorded video files are located in the created recordings folder.
As the saved videos are NOT ready for training a simple preprocessing script, postprocess_v2.py helps converting the annotated RGB video to a binary, label-like video. The post-processed data (now ready for training) is generated by default at data directory.
The post-processing is basically a difference-of-images calculation, followed by binarization and morphological closing and opening.
See postprocess_v2.py for arguments and details.
We assume that the obtained video-label pairs in our simulator database are structured in the following directory format:
simData
├── input
└── label
Run preprocessDatabase.py in order to disassemble the videos into separate images while also sample them into train, validation and test subsets. A typical command would look like this:
python3 preprocessDatabase.py --prep_sim_db --single_sim_dir --dataPath=simData
Note that the script acts in-place. You might want to backup the original data for later use.
For 30 images we have hand-made annotations. Ask the repo maintainers for them. Extract the acquired .zip file to annotated. Install python package labelme to label images yourself and to create binary images from the created labels.
pip install labelme
To download a set of real videos use getRealData.py:
python3 getRealData.py --save_path realData
Copy the extracted annotations (along with the two scripts) to realData/annotated. Run the bash script json2imgs.sh that converts the saved labels from json format to .png images. Then the python script createRealDB.py creates the following directory structure from the available labelled and unlabelled data:
realData
├── input
├── label
└── unlabelled
To separate the obtained labelled images to separate train and test sets use the preprocessing script preprocessDatabase.py
python3 preprocessDatabase.py --prep_real_db --dataPath realData --train_ratio=0.8
The above script created the following directory structure:
realData
├── test
│ ├── input
│ └── label
├── train
│ ├── input
│ └── label
└── unlabelled
└── input
Now create folder simRealData and copy the simulator database to a subfolder source along with the real database transferred to target. The resulting directory structure is as follows:
simRealData
├── source
│ ├── input
│ └── label
└── target
├── test
│ ├── input
│ └── label
├── train
│ ├── input
│ └── label
└── unlabelled
└── input
We assume data is under folder simRealData and its structure matches the above described one. The whole of the source domain image set will be used for training while the target domain train set will act as the validation set.
Simulator baseline training can be reproduced using the following command:
python3 RightLaneModule.py --gpus=1 --dataPath=simData --batch_size=64 --augment --reproducible --max_epochs=175
Consider changing CUDA and GPU dependent parameters to better utilize the hardware. Other parameters are available for fine-tuning, see script or argument help for details.
We can use the same setup as in the baseline solution. The difference is that we will use the target domain train set for actual training. We won't have validation dataset but will use the test set for evaluation during training.
python3 RightLaneSTModule.py --gpus=1 --dataPath=simRealData/ --batch_size=64 --augment --reproducible --max_epochs=175
The training process is exactly the same as in the baseline solution. The difference is that before training the source set is histogram matched with randomly selected real images. This is done offline to eliminate the impact on training time:
python3 hist_match_datasets.py --ds_source=simRealData_hm/source/ --ds_reference=simRealData_hm/target/unlabelled/ --workers=8
Consider increasing the number of CPU workers to speed up conversion time. The actual training is, as we said before, exactly the same as in the baseline solution.
python3 RightLaneModule.py --gpus=1 --dataPath=simRealData_hm --batch_size=64 --augment --reproducible --max_epochs=175 --model_name=HM
This method uses the same database preparation and training process described in the baseline solution. However the original data was too complex for the CycleGAN to handle thereby this simulator data set differs from the others (no domain randomization or optical distortion). The segmentation model training is preceded by a CycleGAN training step and before training simulator data should be converted to the real domain. These two steps are described below.
Clone an implementation of CycleGAN. From the provided repo we need only the CycleGAN related parts, the others can be deleted. In cyclegan.py line 121 and 128 change "../../data/%s" to "%s".
Use getRealData.py to download real Duckietown images.
python3 getRealData.py --save_path realData
The downloaded images will make up domain B while simulator images will be domain A. First the simulator images can be prepared using preprocessDatabase.py (assuming the simpler data set is located at simData2):
python3 preprocessDatabase.py --prep_sim_db --single_sim_dir --dataPath simData2
We are interested only in input images therefore simData2/label folder can be discarded. We would like to create the following directory structure, with leaf folders containing .png files:
sim2real
├── test
│ ├── A
│ └── B
└── train
├── A
└── B
Assuming we have at least 11,500 simulator images and 30,000 real images in folders simData2 and realData the following script divides them into the targeted structure:
mkdir -p sim2real/test/A sim2real/test/B sim2real/train/A sim2real/train/B
find simData2/input/*.png | sort | head -n 10000 | shuf | head -n 5000 | xargs -I{} cp {} sim2real/train/A
find simData2/input/*.png | sort | tail -n 1500 | shuf | head -n 1500 | xargs -I{} cp {} sim2real/test/A
find realData/*.png | sort | head -n 25000 | shuf | head -n 5000 | xargs -I{} cp {} sim2real/train/B
find realData/*.png | sort | tail -n 5000 | shuf | head -n 1500 | xargs -I{} cp {} sim2real/test/B
Now train the CycleGAN (modify hyperparameters if needed):
python3 cyclegan.py --dataset_name sim2real --n_epochs 201 --batch_size 32 --n_cpu 8 --img_height 120 --img_width 160 --checkpoint_interval 25 --lambda_cyc 15 --lambda_id 10
Check the results and if satisfied copy G_AB_200.pth to your working folder.
After creating and formatting the database use sim2real_convert.py to transform simulator images to real domain. This script converts the given database input images inplace.
python3 sim2real_convert.py --dataPath simData2 --modelWeightsPath G_BA_200.pth
The segmentation model training process is the same as in the baseline solution. Don't forget to replace the original source data set with the simpler one!
python3 RightLaneModule.py --gpus=1 --dataPath=simRealData_cyclegan --batch_size=64 --augment --reproducible --max_epochs=175 --model_name=CycleGAN
For SSDA MME a combined (but NOT merged) source-target database is required. The following format is expected:
simRealData
├── source
│ ├── input
│ └── label
└── target
├── test
│ ├── input
│ └── label
├── train
│ ├── input
│ └── label
└── unlabelled
└── input
The training can be done using the following command (see script and argument help for hyperparameters):
python3 RightLaneMMEModule.py --gpus=1 --dataPath=dataSSDA --pretrained_path=results/baseline_weights.pth --batchSize=32 --augment --reproducible --max_epochs=175
Trained model evaluation can be done using the provided script test.py. A usual test of an MME trained model is performed using the following script:
python3 test.py --module_type=mme --checkpointPath=results/mme.ckpt --realDataPath=simRealData/target/unlabelled/input/ --trainDataPath=simRealData/target/train/input/ --testDataPath=simRealData/target/test/
Comparison of trained models can be done using comparison.py that generates an image file with sample predictions of each model.
python3 comparison.py --dataPath=simRealData/target/unlabelled/input/ --baselinePath=results/baseline_weights.pth --sandtPath=results/sandt_weights.pth --cycleganPath=results/CycleGAN_weights.pth --hmPath=results/HM_weights.pth --mmePath=results/mme_weights.pth
One is able to make predictions for a video (treated as a stream of images) via makeDemoVideo.py. It can handle as many videos as you provide:
python3 makeDemoVideo.py --module_type=CycleGAN --checkpointPath=results/CycleGAN.ckpt --videoIns testVideo1.mp4 testVideo2.mp4 --videoOuts demoVideo1.avi demoVideo2.avi
- Distributed training is currently not working because of custom samplers in S&T and MME training.
- Reproducibility is an issue despite setting seeds and setting Cuda flags. Creating a worker_init_fn did not solve this problem either.
For selecting GPU-s and Comet logging, some environment variables and command line parameters have to be defined.
Select GPU(s): (-1: no GPU | 0: select GPU_0 | 0,2: select GPU_0 and GPU_2)
CUDA_VISIBLE_DEVICES=0 python3 ...
Comet logging of training:
COMET_API_KEY=your_api_key COMET_WORKSPACE=your_workspace COMET_PROJECT_NAME=your_project_name python3 ... --comet
Select CPUs: (0,3,4: select CPU_0, CPU_3, CPU_4 | 0-29: select CPU_0 ... CPU_29)
taskset --cpu-list 10-19 python3 ...
Always define environment variables before commands.