Analysis of a Visual Question Answering Model

The code in this repo analyzes a visual question answering model, and is based on the implementation from here.

Initial setup

Clone this repository with:

git clone https://github.com/Cyanogenoid/pytorch-vqa.git --recursive

Set the paths to your downloaded [questions, answers, and MS COCO images][4] in config.py.
- qa_path should contain the files OpenEnded_mscoco_train2014_questions.json, OpenEnded_mscoco_val2014_questions.json, mscoco_train2014_annotations.json, mscoco_val2014_annotations.json.
- train_path, val_path, test_path should contain the train, validation, and test .jpg images respectively.
Pre-process images (93 GiB of free disk space required for f16 accuracy) with [ResNet152 weights ported from Caffe][3] and vocabularies for questions and answers with:

python preprocess-images.py
python preprocess-vocab.py

Train the model in model.py with:

python train.py

This will alternate between one epoch of training on the train split and one epoch of validation on the validation split while printing the current training progress to stdout and saving logs in the logs directory. The logs contain the name of the model, training statistics, contents of config.py, model weights, evaluation information (per-question answer and accuracy), and question and answer vocabularies.

During training (which takes a while), plot the training progress with:

python view-log.py <path to .pth log>

Python 3 dependencies (tested on Python 3.6.2)

torch
torchvision
h5py
tqdm

Downloading results

Download the results from this Dropbox folder

Primary contact: Pramod Kaushik Mudrakartapramodkm@uchicago.edu

Cite

Mudrakarta, Pramod Kaushik, Ankur Taly, Mukund Sundararajan, and Kedar Dhamdhere. "Did the Model Understand the Question?." arXiv preprint arXiv:1805.05492 (2018).

@article{mudrakarta2018did,
  title={Did the Model Understand the Question?},
  author={Mudrakarta, Pramod Kaushik and Taly, Ankur and Sundararajan, Mukund and Dhamdhere, Kedar},
  journal={arXiv preprint arXiv:1805.05492},
  year={2018}
}

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
logs		logs
notebooks		notebooks
resnet @ 9332392		resnet @ 9332392
tsv		tsv
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
config.py		config.py
data.py		data.py
download_data.sh		download_data.sh
model.py		model.py
model_IG.py		model_IG.py
preprocess-images.py		preprocess-images.py
preprocess-vocab.py		preprocess-vocab.py
train.py		train.py
utils.py		utils.py
view-log.py		view-log.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Analysis of a Visual Question Answering Model

Initial setup

Python 3 dependencies (tested on Python 3.6.2)

Downloading results

Cite

About

Releases

Packages

Languages

pramodkaushik/visual_qa_analysis

Folders and files

Latest commit

History

Repository files navigation

Analysis of a Visual Question Answering Model

Initial setup

Python 3 dependencies (tested on Python 3.6.2)

Downloading results

Cite

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages