Md Farhan Ishmam* · Ishmam Tashdeed* · Talukder Asir Saadat* · Md. Hamjajul Ashmafee · Md. Azam Hossain · Abu Raihan Mostofa Kamal
We introduce a large-scale benchmark with 213,000 augmented images to challenge the visual robustness of VQA models against realistic visual corruptions. We also design novel robustness evaluation metrics that can be aggregated into a unified metric, adaptable for multiple use cases. Our experiments reveal key insights into the interplay between model size, performance, and robustness, emphasizing the need for model development without sacrificing robustness for performance.
The architecture of the Visual Robustness Framework and its components –
- Model Repository: Hosts multiple VQA models for inference
- Generator: Applies the corruption functions to the VQA dataset and generates multiple augmented datasets
- Inference Module: Inferences on the augmented datasets using models from the model repository
- Robustness Evaluation Module: Evaluates the results of the inference module by computing robustness metrics
- Visualization Module: Produces visualizations based on the predicted answers.
The VQA datasets, models, and corruptions are the input to the framework while the VRE scores, accuracy scores, and visualizations will be produced as the output.
We introduce 14 corruption functions categorized into 6 classes and replicating realistic visual effects.
Metric Name | Symbol | Formula |
---|---|---|
First-Drop | ||
Range of Error | ||
Error Rate | ||
Average Error | ||
Average Difference of Corruption Error | ||
Visual Robustness Error |
Image directory
, question JSON
, and annotation JSON
data in your drive.
This repository is tested on Python 3.8
. Create a virtual environment to install all the dependencies.
-
To install
torch
visit the PyTorch website and follow the instructions. -
Install
requirements.txt
file running the command:pip install -r requirements.txt
-
Install
MagickWand
library. If you have a Linux system use:sudo apt-get install libmagickwand-dev
If you have a Windows system, follow steps given in the website.
This repository uses a random split of the VQAv2 dataset found on the Visual Question Answering website.
To run the experiment, execute the following command:
python main.py
Data analysis for this work is done in Python via the given notebooks. All the plots included in the paper can be reproduced via the notebooks.
Some of the visual corruption functions of this repository have been taken from ImageNet-C.
@article{ishmam2024visual,
title={Visual Robustness Benchmark for Visual Question Answering (VQA)},
author={Ishmam, Md Farhan and Tashdeed, Ishmam and Saadat, Talukder Asir and Ashmafee, Md Hamjajul and Kamal, Dr Abu Raihan Mostofa and Hossain, Dr Md Azam},
journal={arXiv preprint arXiv:2407.03386},
year={2024}
}