Introduction

This is the official source code for Dual Semantic Relations Attention Network(DSRAN) proposed in our journal paper Learning Dual Semantic Relations with Graph Attention for Image-Text Matching (TCSVT 2020). It is built on top of the VSE++ in PyTorch.

The framework of DSRAN:

The results on MSCOCO and Flickr30K dataset:(With BERT or GRU)

GRU	Image-to-Text			Text-to-Image
Dataset	R@1	R@5	R@10	R@1	R@5	R@10	Rsum
MSCOCO-1K	80.4	96.7	98.7	64.2	90.4	95.8	526.2
MSCOCO-5K	57.6	85.6	91.9	41.5	71.9	82.1	430.6
Flickr30k	79.6	95.6	97.5	58.6	85.8	91.3	508.4

BERT	Image-to-Text			Text-to-Image
Dataset	R@1	R@5	R@10	R@1	R@5	R@10	Rsum
MSCOCO-1K	80.6	96.7	98.7	64.5	90.8	95.8	527.1
MSCOCO-5K	57.9	85.3	92.0	41.7	72.7	82.8	432.4
Flickr30k	80.5	95.5	97.9	59.2	86.0	91.9	511.0

Requirements and Installation

We recommended the following dependencies.

Python 3.6
PyTorch 1.1.0
NumPy (>1.12.1)
torchtext
pycocotools
nltk

Download data

Download the raw images, pre-computed image features, pre-trained BERT models, pre-trained ResNet152 model and pre-trained DSRAN models. As for the raw images, they can be downloaded from VSE++.

wget http://www.cs.toronto.edu/~faghri/vsepp/data.tar
wget http://www.cs.toronto.edu/~faghri/vsepp/vocab.tar

We refer to the path of extracted files for data.tar as $DATA_PATH while only raw images are used which are coco and f30k.

For pre-computed image features, they can be obtained from VLP. These zip files should be extracted into the fold data/joint-pretrain. We refer to the path of extracted region_bbox_file(.h5) as $REGION_BBOX_FILE and regional feature paths feat_cls_1000/ for COCO and trainval/ for FLICKR30K as $FEATURE_PATH.

Pre-trained ResNet152 model can be downloaded from torchvision and put in the root directory.

wget https://download.pytorch.org/models/resnet152-b121ed2d.pth

For our trained DSRAN models, you can download runs.zip on Google Drive or GRU.zip together with BERT.zip on BaiduNetDisk(extract code:1119). There are totally 8 models (4 for each dataset).

Pre-trained BERT models are obtained form an old version of transformers. It is noticed that there's a simpler way of using BERT as seen in transformers. We'll update the code in the future. The pre-trained models we use can be downloaded from the same Google Drive and BaiduNetDisk(extract code:1119) links. We refer to the path of extracted files for uncased_L-12_H-768_A-12.zip as $BERT_PATH.

Data Structure

├── data/
|   ├── coco/           /* MSCOCO raw images
|   |   ├── images/
|   |   |   ├── train2014/
|   |   |   ├── val2014/
|   |   ├── annotations/
|   ├── f30k/           /* Flickr30K raw images
|   |   ├── images/
|   |   ├── dataset_flickr30k.json
|   ├── joint-pretrain/           /* pre-computed image features
|   |   ├── COCO/
|   |   |   ├── region_feat_gvd_wo_bgd/
|   |   |   |   ├── feat_cls_1000/           /* $FEATURE_PATH
|   |   |   |   ├── coco_detection_vg_thresh0.2_feat_gvd_checkpoint_trainvaltest.h5  /* $REGION_BBOX_FILE
|   |   |   ├── annotations/
|   |   ├── flickr30k/
|   |   |   ├── region_feat_gvd_wo_bgd/
|   |   |   |   ├── trainval/                /* $FEATURE_PATH
|   |   |   |   ├── flickr30k_detection_vg_thresh0.2_feat_gvd_checkpoint_trainvaltest.h5  /* $REGION_BBOX_FILE
|   |   |   ├── annotations/

Evaluate trained models

Test on single model:

Test on MSCOCO dataset (1K and 5K simultaneously):

Test on BERT-based models:

python evaluation_bert.py --model BERT/cc_model1 --fold --data_path "$DATA_PATH" --region_bbox_file "$REGION_BBOX_FILE" --feature_path "$FEATURE_PATH"

Test on GRU-based models:

python evaluation.py --model GRU/cc_model1 --fold --data_path "$DATA_PATH" --region_bbox_file "$REGION_BBOX_FILE" --feature_path "$FEATURE_PATH"

Test on Flickr30K dataset:

Test on BERT-based models:

python evaluation_bert.py --model BERT/f_model1 --data_path "$DATA_PATH" --region_bbox_file "$REGION_BBOX_FILE" --feature_path "$FEATURE_PATH"

Test on GRU-based models:

python evaluation.py --model GRU/f_model1 --data_path "$DATA_PATH" --region_bbox_file "$REGION_BBOX_FILE" --feature_path "$FEATURE_PATH"

Test on two-models ensemble and re-rank:

/* Remember to modify the "$DATA_PATH", "$REGION_BBOX_FILE" and "$FEATURE_PATH" in the .sh files.

Test on MSCOCO dataset (1K and 5K simultaneously):
- Test on BERT-based models:
```
sh test_bert_cc.sh
```
- Test on GRU-based models:
```
sh test_gru_cc.sh
```
Test on Flickr30K dataset:
- Test on BERT-based models:
```
sh test_bert_f.sh
```
- Test on GRU-based models:
```
sh test_gru_f.sh
```

Train new models

Train a model with BERT on MSCOCO:

python train_bert.py --data_path "$DATA_PATH" --data_name coco --num_epochs 18 --batch_size 320 --lr_update 9 --logger_name runs/cc_bert --bert_path "$BERT_PATH" --ft_bert --warmup 0.1 --K 4 --feature_path "$FEATURE_PATH" --region_bbox_file "$REGION_BBOX_FILE"

Train a model with BERT on Flickr30K:

python train_bert.py --data_path "$DATA_PATH" --data_name f30k --num_epochs 12 --batch_size 128 --lr_update 6 --logger_name runs/f_bert --bert_path "$BERT_PATH" --ft_bert --warmup 0.1 --K 2 --feature_path "$FEATURE_PATH" --region_bbox_file "$REGION_BBOX_FILE"

Train a model with GRU on MSCOCO:

python train.py --data_path "$DATA_PATH" --data_name coco --num_epochs 18 --batch_size 300 --lr_update 9 --logger_name runs/cc_gru --use_restval --K 2 --feature_path "$FEATURE_PATH" --region_bbox_file "$REGION_BBOX_FILE"

Train a model with GRU on Flickr30K:

python train.py --data_path "$DATA_PATH" --data_name f30k --num_epochs 16 --batch_size 128 --lr_update 8 --logger_name runs/f_gru --use_restval --K 2 --feature_path "$FEATURE_PATH" --region_bbox_file "$REGION_BBOX_FILE"

Acknowledgement

We thank Linyang Li for the help with the code and provision of some computing resources.

Reference

If DSRAN is useful for your research, please cite our paper:

@ARTICLE{9222079,
  author={Wen, Keyu and Gu, Xiaodong and Cheng, Qingrong},
  journal={IEEE Transactions on Circuits and Systems for Video Technology}, 
  title={Learning Dual Semantic Relations With Graph Attention for Image-Text Matching}, 
  year={2021},
  volume={31},
  number={7},
  pages={2866-2879},
  doi={10.1109/TCSVT.2020.3030656}}

License

Apache License 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 60 Commits
coco_sims		coco_sims
data		data
figures		figures
flickr_sims		flickr_sims
pytorch_pretrained_bert		pytorch_pretrained_bert
runs		runs
uncased_L-12_H-768_A-12		uncased_L-12_H-768_A-12
vocab		vocab
GAT.py		GAT.py
LICENSE		LICENSE
README.md		README.md
data.py		data.py
data_bert.py		data_bert.py
evaluation.py		evaluation.py
evaluation_bert.py		evaluation_bert.py
model.py		model.py
model_bert.py		model_bert.py
rerank.py		rerank.py
resnet.py		resnet.py
test_bert_cc.sh		test_bert_cc.sh
test_bert_f.sh		test_bert_f.sh
test_gru_cc.sh		test_gru_cc.sh
test_gru_f.sh		test_gru_f.sh
train.py		train.py
train_bert.py		train_bert.py
vocab.py		vocab.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Requirements and Installation

Download data

Data Structure

Evaluate trained models

Test on single model:

Test on two-models ensemble and re-rank:

Train new models

Acknowledgement

Reference

License

About

Releases

Packages

Languages

License

kywen1119/DSRAN

Folders and files

Latest commit

History

Repository files navigation

Introduction

Requirements and Installation

Download data

Data Structure

Evaluate trained models

Test on single model:

Test on two-models ensemble and re-rank:

Train new models

Acknowledgement

Reference

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages