PyTorch implementation for AAAI2021 paper of “Similarity Reasoning and Filtration for Image-Text Matching”.
It is built on top of the SCAN and Awesome_Matching.
We have released two versions of SGRAF: Branch main
for python2.7; Branch python3.6
for python3.6.
If any problems, please contact me at r1228240468@gmail.com. (r1228240468@mail.dlut.edu.cn is deprecated)
The framework of SGRAF:
The updated results (Better than the original paper)
Dataset | Module | Sentence retrieval | Image retrieval | ||||
R@1 | R@5 | R@10 | R@1 | R@5 | R@10 | ||
Flick30k | SAF | 75.6 | 92.7 | 96.9 | 56.5 | 82.0 | 88.4 |
SGR | 76.6 | 93.7 | 96.6 | 56.1 | 80.9 | 87.0 | |
SGRAF | 78.4 | 94.6 | 97.5 | 58.2 | 83.0 | 89.1 | |
MSCOCO1k | SAF | 78.0 | 95.9 | 98.5 | 62.2 | 89.5 | 95.4 |
SGR | 77.3 | 96.0 | 98.6 | 62.1 | 89.6 | 95.3 | |
SGRAF | 79.2 | 96.5 | 98.6 | 63.5 | 90.2 | 95.8 | |
MSCOCO5k | SAF | 55.5 | 83.8 | 91.8 | 40.1 | 69.7 | 80.4 |
SGR | 57.3 | 83.2 | 90.6 | 40.5 | 69.6 | 80.3 | |
SGRAF | 58.8 | 84.8 | 92.1 | 41.6 | 70.9 | 81.5 |
We recommended the following dependencies for Branch main
.
- Python 2.7
- PyTorch (>=0.4.1)
- NumPy (>=1.12.1)
- TensorBoard
- Punkt Sentence Tokenizer:
import nltk
nltk.download()
> d punkt
We follow SCAN to obtain image features and vocabularies, which can be downloaded by using:
https://www.kaggle.com/datasets/kuanghueilee/scan-features
Another download link is available below:
https://drive.google.com/drive/u/0/folders/1os1Kr7HeTbh8FajBNegW8rjJf6GIhFqC
The pretrained models are only for Branch python3.6
(python3.6), not for Branch main
(python2.7).
Modify the model_path, data_path, vocab_path in the evaluation.py
file. Then run evaluation.py
:
python evaluation.py
Note that fold5=True
is only for evaluation on mscoco1K (5 folders average) while fold5=False
for mscoco5K and flickr30K. Pretrained models and Log files can be downloaded from Flickr30K_SGRAF and MSCOCO_SGRAF.
Modify the data_path, vocab_path, model_name, logger_name in the opts.py
file. Then run train.py
:
For MSCOCO:
(For SGR) python train.py --data_name coco_precomp --num_epochs 20 --lr_update 10 --module_name SGR
(For SAF) python train.py --data_name coco_precomp --num_epochs 20 --lr_update 10 --module_name SAF
For Flickr30K:
(For SGR) python train.py --data_name f30k_precomp --num_epochs 40 --lr_update 30 --module_name SGR
(For SAF) python train.py --data_name f30k_precomp --num_epochs 30 --lr_update 20 --module_name SAF
If SGRAF is useful for your research, please cite the following paper:
@inproceedings{Diao2021SGRAF,
title={Similarity reasoning and filtration for image-text matching},
author={Diao, Haiwen and Zhang, Ying and Ma, Lin and Lu, Huchuan},
booktitle={Proceedings of the AAAI conference on artificial intelligence},
volume={35},
number={2},
pages={1218--1226},
year={2021}
}