Code of the NeurIPS 2020 paper:
Language and Visual Entity Relationship Graph for Agent Navigation
Yicong Hong, Cristian Rodriguez-Opazo, Yuankai Qi, Qi Wu, Stephen Gould
[Paper] [Supplemental] [GitHub]
"Halliday hated making rules. Why is that line sticking in my head? Maybe it's because Art3mis said it, and she's hot. Maybe it's because she called me out. Sitting here in my tiny corner of nowhere, protecting my small slice of nothing." --- Ready Player One 2018.
Install the Matterport3D Simulator.
Please find the versions of packages in our environment here. In particular, we use:
- Python 3.6.9
- NumPy 1.18.1
- OpenCV 3.4.2
- PyTorch 1.3.0
- Torchvision 0.4.1
Please follow the instructions below to prepare the data in directories:
connectivity
- Download the connectivity maps [23.8MB].
data
- Download the R2R data [5.8MB].
- Download the vocabulary and the augmented data from EnvDrop [79.5MB].
img_features
- Download the Scene features [4.2GB] (ResNet-152-Places365).
- Download the pre-processed Object features and vocabulary [1.3GB] (Caffe Faster-RCNN).
snap
- Download the trained network weights [146.0MB]
Please read Peter Anderson's VLN paper for the R2R Navigation task.
Our code is based on the code structure of the EnvDrop.
To replicate the performance reported in our paper, load the trained network weights and run validation:
bash run/agent.bash
To train the network from scratch, first train a Navigator on the R2R training split:
Modify run/agent.bash
, remove the argument for --load
and set --train listener
. Then,
bash run/agent.bash
The trained Navigator will be saved under snap/
.
You also need to train a Speaker for augmented training:
bash run/speak.bash
The trained Speaker will be saved under snap/
.
Finally, keep training the Navigator with the mixture of original data and augmented data:
bash run/bt_envdrop.bash
We apply a one-step learning rate decay to 1e-5 when training saturates.
If you use or discuss our Entity Relationship Graph, please cite our paper:
@article{hong2020language,
title={Language and Visual Entity Relationship Graph for Agent Navigation},
author={Hong, Yicong and Rodriguez, Cristian and Qi, Yuankai and Wu, Qi and Gould, Stephen},
journal={Advances in Neural Information Processing Systems},
volume={33},
year={2020}
}