Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docs] Update UNITER project doc #1176

Open
wants to merge 2 commits into
base: gh/ryan-qiyu-jiang/40/base
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 22 additions & 1 deletion website/docs/projects/uniter.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,12 +18,33 @@ Computer Vision, 2020b. ([arXiV](https://arxiv.org/pdf/1909.11740))
}
```

This repository contains the checkpoint for the pytorch implementation of the VILLA model, released originally under this ([repo](https://github.com/zhegan27/VILLA)). Please cite the following papers if you are using VILLA model from mmf:

* Gan, Z., Chen, Y. C., Li, L., Zhu, C., Cheng, Y., & Liu, J. (2020). *Large-scale adversarial training for vision-and-language representation learning.* arXiv preprint arXiv:2006.06195. ([arXiV](https://arxiv.org/abs/2006.06195))
```
@inproceedings{gan2020large,
title={Large-Scale Adversarial Training for Vision-and-Language Representation Learning},
author={Gan, Zhe and Chen, Yen-Chun and Li, Linjie and Zhu, Chen and Cheng, Yu and Liu, Jingjing},
booktitle={NeurIPS},
year={2020}
}
```

## Installation

Follow installation instructions in the [documentation](https://mmf.readthedocs.io/en/latest/notes/installation.html).

## Training

UNITER uses image region features extracted by [BUTD](https://github.com/peteanderson80/bottom-up-attention).
These are different features than those extracted in MMF and used by default in our datasets.
Support for BUTD feature extraction through pytorch in MMF is in the works.
However this means that the UNITER and VILLA checkpoints which are pretrained on BUTD features,
do not work out of the box on image region features in MMF.
You can still finetune these checkpoints in MMF on the Faster RCNN features used in MMF datasets for comparable performance.
This is what is done by default.
Or you can download BUTD features for the dataset you're working with and change the dataset in MMF to use these.

To train a fresh UNITER model on the VQA2.0 dataset, run the following command
```
mmf_run config=projects/uniter/configs/vqa2/defaults.yaml run_type=train_val dataset=vqa2 model=uniter
Expand All @@ -33,6 +54,7 @@ To finetune a pretrained UNITER model on the VQA2.0 dataset,
```
mmf_run config=projects/uniter/configs/vqa2/defaults.yaml run_type=train_val dataset=vqa2 model=uniter checkpoint.resume_zoo=uniter.pretrained
```
The finetuning configs for VQA2 are from the UNITER base 4-gpu [configs](https://github.com/ChenRocks/UNITER/blob/master/config/train-vqa-base-4gpu.json). For an example finetuning config with smaller batch size consider using the ViLT VQA2 training configs, however this may yield slightly lower performance.

To finetune a pretrained [VILLA](https://arxiv.org/pdf/2006.06195.pdf) model on the VQA2.0 dataset,
```
Expand All @@ -44,5 +66,4 @@ To pretrain UNITER on the masked COCO dataset, run the following command
mmf_run config=projects/uniter/configs/masked_coco/defaults.yaml run_type=train_val dataset=masked_coco model=uniter
```


Based on the config used and `do_pretraining` defined in the config, the model can use the pretraining recipe described in the UNITER paper, or be finetuned on downstream tasks.