Pytorch Implementation for the paper:
Reasoning Visual Dialogs with Structural and Partial Observations
Zilong Zheng*, Wenguan Wang*, Siyuan Qi*, Song-Chun Zhu (* equal contributions)
In CVPR 2019 (Oral)
This codebase is tested using Ubuntu 16.04, Python 3.5 and a single NVIDIA TITAN Xp GPU. Similar configurations are preferred.
- Clone this repo:
git clone https://github.com/zilongzheng/visdial-gnn.git
cd visdial-gnn
- Install requirements
- Pytorch 0.4.1
- For other Python dependencies, run:
pip install -r requirements.txt
-
We use pre-extracted image features as specified here for VisDial v1.0.
-
We use preprocessed dialog data as specified here
-
To reproduce our results, you can download preprocessed data and save it to
$PROJECT_DIR/data/v1.0/
by
bash ./scripts/download_data_v1.sh faster_rcnn
- To train a discriminative model, run:
#!./scripts/train_v1_faster_rcnn.sh
python train.py --dataroot ./data/v1.0/
- To evaluate the model using val split, run:
python evaluate.py --dataroot ./data/v1.0/ --split val --ckpt /path/to/checkpoint
- We use pre-extracted image features from VGG-16 and VGG-19 as specified here
- To download preprocessed data (e.g. vgg19) and save it to
$PROJECT_DIR/data/v0.9/
, run
bash ./scripts/download_data_v09.sh vgg19
- To train a discriminative model using vgg19 pretrained image features, run
#!./scripts/train_v09_vgg19.sh
python train.py --dataroot ./data/v0.9/ \
--version 0.9 \
--img_train data_img_vgg19_pool5.h5 \
--visdial_data visdial_data.h5 \
--visdial_params visdial_params.json \
--img_feat_size 512
- To evaluate the model using val split, run:
python evaluate.py --dataroot ./data/v0.9/ \
--version 0.9 \
--split val \
--ckpt /path/to/checkpoint \
--img_val data_img_vgg19_pool5.h5 \
--visdial_data visdial_data.h5 \
--visdial_params visdial_params.json \
--img_feat_size 512
If you use this code for your research, please cite our paper.
@inproceedings{zheng2019reasoning,
title={Reasoning Visual Dialogs with Structural and Partial Observations},
author={Zheng, Zilong and Wang, Wenguan and Qi, Siyuan and Zhu, Song-Chun},
booktitle={Computer Vision and Pattern Recognition (CVPR), 2019 IEEE Conference on},
year={2019}
}
We use Visual Dialog Challenge Starter Code and GPNN as referenced util code.