This is the official PyTorch implementation of our paper:
Tune-An-Ellipse: CLIP Has Potential to Find What You Want
CVPR 2024 Highlight
Jinheng Xie, Songhe Deng, Bing Li, Haozhe Liu, Yawen Huang, Yefeng Zheng, Jurgen Schmidhuber, Bernard Ghanem, Linlin Shen, Mike Zheng Shou
pip install requirements.txt
- The code will automatically download the clip model checkpoints
python gradio_demo.py
python run.py --img_path source/cat.png --caption "jumping cat"
Result image will be saved at workspace/test/hd_tune
- Please following the instructions in lichengunc/refer to download the refcoco series datasets
- After downloading all of them, organize the data as follows in
/PATH/TO/RefCOCO
,
├── images
│ └── mscoco
│ └── images
│ └── train2014
├── reclip_data
│ ├── refcoco+_dets_dict.json
│ ├── refcoco_dets_dict.json
│ └── refcocog_dets_dict.json
├── refcoco
│ ├── instances.json
│ ├── refs(google).p
│ └── refs(unc).p
├── refcoco+
│ ├── instances.json
│ └── refs(unc).p
└── refcocog
├── instances.json
├── refs(google).p
└── refs(umd).p
- Using the prepared scripts,
bash scripts/refcoco.sh
bash scripts/refcoco+.sh
bash scripts/refcocog.sh
We used the code from CLIP-ES to generate the cam of clip models, thanks to their great work!
@InProceedings{Xie_2024_CVPR,
author = {Xie, Jinheng and Deng, Songhe and Li, Bing and Liu, Haozhe and Huang, Yawen and Zheng, Yefeng and Schmidhuber, Jurgen and Ghanem, Bernard and Shen, Linlin and Shou, Mike Zheng},
title = {Tune-An-Ellipse: CLIP Has Potential to Find What You Want},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2024},
pages = {13723-13732}
}