PyTorch implementation for TIP2024 paper of “Deep Boosting Learning: A Brand-new Cooperative Approach for Image-Text Matching”.
It is built on top of the SGRAF, DML, DINO and Awesome_Matching.
If any problems, please contact me at r1228240468@gmail.com. (diaohw@mail.dlut.edu.cn is deprecated)
The framework of DBL:
Utilize pip install -r requirements.txt
for the following dependencies.
- Python 3.7.11
- PyTorch 1.7.1
- NumPy 1.21.5
- Punkt Sentence Tokenizer:
import nltk
nltk.download()
> d punkt
We follow SCAN to obtain image features and vocabularies, which can be downloaded by using:
https://www.kaggle.com/datasets/kuanghueilee/scan-features
Another download link is available below:
https://drive.google.com/drive/u/0/folders/1os1Kr7HeTbh8FajBNegW8rjJf6GIhFqC
data
├── coco_precomp
│ ├── train_ids.txt
│ ├── train_caps.txt
│ └── ......
│
└── f30k_precomp
├── train_ids.txt
├── train_caps.txt
└── ......
Modify the model_path, split, fold5 in the test.py
file.
Note that fold5=True
is only for evaluation on MSCOCO1K (5 folders average) while fold5=False
for MSCOCO5K and Flickr30K. Pretrained models can be downloaded from Here with password [dhw4].
Then run python test.py
in the terminal.
Uncomment the required parts in the script.sh
file.
Then run bash script.sh
in the terminal:
If DBL is useful for your research, please cite the following paper:
@article{Diao2024DBL,
author={Diao, Haiwen and Zhang, Ying and Gao, Shang and Ruan, Xiang and Lu, Huchuan},
title={Deep Boosting Learning: {A} Brand-New Cooperative Approach for Image-Text Matching},
journal={IEEE Transactions on Image Processing},
year={2024},
volume={33},
pages={3341--3352}
}