Visual Robustness Benchmark for Visual Question Answering (VQA)

Md Farhan Ishmam* · Ishmam Tashdeed* · Talukder Asir Saadat* · Md. Hamjajul Ashmafee · Md. Azam Hossain · Abu Raihan Mostofa Kamal

We introduce a large-scale benchmark with 213,000 augmented images to challenge the visual robustness of VQA models against realistic visual corruptions. We also design novel robustness evaluation metrics that can be aggregated into a unified metric, adaptable for multiple use cases. Our experiments reveal key insights into the interplay between model size, performance, and robustness, emphasizing the need for model development without sacrificing robustness for performance.

Overview of the Visual Robustness Benchmark

The architecture of the Visual Robustness Framework and its components –

Model Repository: Hosts multiple VQA models for inference
Generator: Applies the corruption functions to the VQA dataset and generates multiple augmented datasets
Inference Module: Inferences on the augmented datasets using models from the model repository
Robustness Evaluation Module: Evaluates the results of the inference module by computing robustness metrics
Visualization Module: Produces visualizations based on the predicted answers.

The VQA datasets, models, and corruptions are the input to the framework while the VRE scores, accuracy scores, and visualizations will be produced as the output.

Visual Corruption Functions

We introduce 14 corruption functions categorized into 6 classes and replicating realistic visual effects.

Robustness Evaluation Metrics

Metric Name	Symbol	Formula
First-Drop	$F_{v,c}$	$\frac{E_{v,c,1} - E_{v,c,0}}{E_{v,c,0}}$
Range of Error	$\mathcal{R}_{v,c}$	$\frac{\underset{l\in\mathbb{L}}{max\ }E_{v,c,l} - \underset{l\in\mathbb{L}}{min\ }E_{v,c,l}}{\underset{l\in\mathbb{L}}{min\ }E_{v,c,l}}$
Error Rate	$\rho_{v,c}$	$\frac{L\underset{l\in \mathbb{L}}{\sum}\left(l\cdot E_{v,c,l}\right)-\left(\underset{l\in \mathbb{L}}{\sum}l\right)\cdot \left(\underset{l\in \mathbb{L}}{\sum}E_{v,c,l}\right)}{L\underset{l\in \mathbb{L}}{\sum}l^2 - \left(\underset{l\in \mathbb{L}}{\sum}l\right)^2}$
Average Error	$\mu_{v,c}$	$\frac{1}{L}\underset{l\in \mathbb{L}}{\sum}E_{v,c,l}$
Average Difference of Corruption Error	$\Delta_{v,c}$	$\frac{1}{L}\underset{l\in \mathbb{L}'}{\sum}\left(E_{v,c,l}-E_{v,c,0}\right)$
Visual Robustness Error	$VRE_{v/c}$	$\underset{\mathcal{M}\in \mathbb{M}}{\sum}W_\mathcal{M}\mathcal{M}_{v/c}$

Quick Start

⚠️ Ensure that you have the Image directory, question JSON, and annotation JSON data in your drive. ⚠️

Installation

This repository is tested on Python 3.8. Create a virtual environment to install all the dependencies.

To install torch visit the PyTorch website and follow the instructions.
Install requirements.txt file running the command:
```
pip install -r requirements.txt
```
Install MagickWand library. If you have a Linux system use:
```
sudo apt-get install libmagickwand-dev
```

If you have a Windows system, follow steps given in the website.

Dataset

This repository uses a random split of the VQAv2 dataset found on the Visual Question Answering website.

Experiment

To run the experiment, execute the following command:

python main.py

Data Analysis

Data analysis for this work is done in Python via the given notebooks. All the plots included in the paper can be reproduced via the notebooks.

Acknowledgement

Some of the visual corruption functions of this repository have been taken from ImageNet-C.

Citation

@article{ishmam2024visual,
  title={Visual Robustness Benchmark for Visual Question Answering (VQA)},
  author={Ishmam, Md Farhan and Tashdeed, Ishmam and Saadat, Talukder Asir and Ashmafee, Md Hamjajul and Kamal, Dr Abu Raihan Mostofa and Hossain, Dr Md Azam},
  journal={arXiv preprint arXiv:2407.03386},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 114 Commits
__pycache__		__pycache__
assets		assets
models		models
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
create_dataset_subset.py		create_dataset_subset.py
dataset.py		dataset.py
generator.py		generator.py
main.py		main.py
parrot_saved.jpg		parrot_saved.jpg
report.py		report.py
requirements.txt		requirements.txt
test.jpg		test.jpg
unit_testing.py		unit_testing.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Visual Robustness Benchmark for Visual Question Answering (VQA)

Overview of the Visual Robustness Benchmark

Visual Corruption Functions

Robustness Evaluation Metrics

Quick Start

Installation

Dataset

Experiment

Data Analysis

Acknowledgement

Citation

About

Releases

Packages

Contributors 3

Languages

License

ishmamt/VQA-Visual-Robustness-Benchmark

Folders and files

Latest commit

History

Repository files navigation

Visual Robustness Benchmark for Visual Question Answering (VQA)

Overview of the Visual Robustness Benchmark

Visual Corruption Functions

Robustness Evaluation Metrics

Quick Start

Installation

Dataset

Experiment

Data Analysis

Acknowledgement

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages