Pytorch Implementation for the paper:
DMRM: A Dual-channel Multi-hop Reasoning Model for Visual Dialog
Feilong Chen, Fandong Meng, Jiaming Xu, Peng Li, Bo Xu,and Jie Zhou
In AAAI 2020
This code is implemented using PyTorch v0.3.0 with CUDA 8 and CuDNN 7.
It is recommended to set up this source code using Anaconda or Miniconda.
- Install Anaconda or Miniconda distribution based on Python 3.6+ from their downloads' site.
- Clone this repository and create an environment:
git clone https://github.com/phellonchen/DMRM.git
conda create -n dmrm_visdial python=3.6
# activate the environment and install all dependencies
conda activate dmrm_visdial
cd $PROJECT_ROOT/
pip install -r requirements.txt
-
Download the VisDial dialog json files from here and keep it under
$PROJECT_ROOT/data
directory, for default arguments to work effectively. -
We used the Faster-RCNN pre-trained with Visual Genome as image features. Download the image features below, and put each feature under
$PROJECT_ROOT/data
directory.
features_faster_rcnn_x101_train.h5
: Bottom-up features of 36 proposals from images oftrain
split.features_faster_rcnn_x101_val.h5
: Bottom-up features of 36 proposals from images ofval
split.features_faster_rcnn_x101_test.h5
: Bottom-up features of 36 proposals from images oftest
split.
- Download the GloVe pretrained word vectors from here, and keep
glove.6B.300d.txt
under$PROJECT_ROOT/data
directory.
# data preprocessing
cd $PROJECT_ROOT/script/
python prepro.py
# Word embedding vector initialization (GloVe)
cd $PROJECT_ROOT/script/
python create_glove.py
Simple run
python main_v0.9.py or python main_v1.0.py
Our model save model checkpoints at every epoch and undate the best one. You can change it by editing the train.py
.
Logging data $PROJECT_ROOT/save_models/time/log.txt
shows epoch, loss, and learning rate.
Evaluation of a trained model checkpoint can be evaluated as follows:
python eval_v0.9.py or python eval_v1.0.py
Performance on v0.9 val-std
(trained on v0.9
train):
Model | MRR | R@1 | R@5 | R@10 | Mean |
---|---|---|---|---|---|
DMRM | 55.96 | 46.20 | 66.02 | 72.43 | 13.15 |
Performance on v1.0 val-std
(trained on v1.0
train):
Model | MRR | R@1 | R@5 | R@10 | Mean |
---|---|---|---|---|---|
DMRM | 50.16 | 40.15 | 60.02 | 67.21 | 15.19 |
If you find this repository useful, please consider citing our work:
@inproceedings{chen2020dmrm,
title={DMRM: A dual-channel multi-hop reasoning model for visual dialog},
author={Chen, Feilong and Meng, Fandong and Xu, Jiaming and Li, Peng and Xu, Bo and Zhou, Jie},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={34},
number={05},
pages={7504--7511},
year={2020}
}
MIT License