Institution: Computer Vision Center (CVC) Internship Period: January 24 - June 24
This project aims to implement reward optimization in the domain of semantic segmentation within computer vision. The primary objective is to leverage this technique as a supplementary tool for domain adaptation to enhance the model's metric scores.
The theoretical approach is inspired by the paper "Tuning Computer Vision Models with Task Rewards", which outlines a general methodology applicable to similar tasks.
The implementation is conducted within the MMSegmentation framework, a platform known for its depth in runtime modifications and widespread use in the scientific community.
This project was primarily developed using PyTorch, a leading deep learning framework that facilitates building complex neural network architectures. PyTorch's dynamic computation graph enabled flexible and efficient model development and experimentation.
- NumPy: Heavily used for high-performance scientific computing and data analysis, particularly for manipulating large arrays and matrices of numeric data.
- Pandas: Employed for data manipulation and analysis, particularly useful during the data exploration phases for handling and processing data in a tabular form.
- MMSegmentation: An open source semantic segmentation toolbox based on PyTorch, utilized for its robust model structures and segmentation utilities.
- Git: Used for version control, allowing effective tracking of changes and collaboration across various phases of the project.
- Conda: Employed as a package and environment management system, which helped maintain consistency across development environments.
- CUDA: Leveraged for GPU acceleration to facilitate efficient training of deep learning models, essential for handling complex computations and large datasets.
- Distributed Training: Implemented across multiple GPUs on remote servers, enhancing the training speed and scalability of the model development process.
These technologies combined to create a robust development environment that supported the advanced computational needs of the project, from model training to data analysis.
Utilities contains several useful tools that have been used in this project.
-
Computes the frequency of each class in a dataset, given its annotations root directory.
-
Using the frequencies, it computes the class weights with the following formulas:
-
$C_c$ : Class counts -
$T_c$ : Total counts -
$C_w$ : Class weights
$$C_w = \frac{1}{C_c + \text{CLS_SMOOTH} \cdot T_c} \ \ \ \ \ \ \ \ C_w = \frac{N \cdot C_w}{\sum{C_w}}$$ -
After performing a MMSegmentation training run under an example working directory workdir_path
, the workdir_path/id/vis_data/scalars.json
can be used to explore the experiment's results.
-
Given a
scalars.json
file, creates two CSV files,loss.csv
andmetrics.csv
, which hold the relevant information for exploration in a treatable format. -
Loads experiment scalars from
loss.csv
andmetrics.csv
and performs several plots. Some utilities include:- Plotting the mIoU at each validation step.
- Plotting the mIoU trend competing with the current model.
- Plotting the regression line of the mIoU values to see the trend.
- Plotting the smoothed mIoU values, similar to what TensorBoard does.
Apart from the reward optimisation implementation, some additional experimentation has been done on other datasets. They have also been integrated to the hole MMSegmentation framework.
This is a Face Parsing and Portrait Segmentation Dataset. The original repo included some implementation guidelines to MMSegmentation. However, they where based on MMSegmentation 0.x. Under EasyPortrait are the following scripts necessary for the migration to MMSegmentation 1.x.
- easyportrait_dataset_config: Dataset config to use.
- model_config_example: Example of a model using the new config file.
- easyportrait_dataset_class: Dataset's class definition.
Also, you can see those changes applied in a forked repo I made: Updated EasyPortrait