This is our implementation for the paper VTG-GPT: Tuning-Free Zero-Shot Video Temporal Grounding with GPT.
VTG-GPT leverages frozen GPTs to enable zero-shot inference without training.
- Install dependencies
conda create -n vtg-gpt python=3.10
conda activate vtg-gpt
pip install -r requirements.txt
- Unzip caption files
cd data/qvhighlights/caption/
unzip val.zip
# inference
python infer_qvhighlights.py val
# evaluation
bash standalone_eval/eval.sh
Run the above code to get:
Metrics | R1@0.5 | R1@0.7 | mAP@0.5 | mAP@0.75 | mAP@avg |
---|---|---|---|---|---|
Values | 59.03 | 38.90 | 56.11 | 35.44 | 35.57 |
cd minigpt
conda create --name minigptv python=3.9
pip install -r requirements.txt
python run_v2.py
cd Baichuan2
conda activate vtg-gpt
python rephrase_query.py
We thank Youyao Jia for helpful discussions.
This code is based on Moment-DETR and SeViLA. We used resources from MiniGPT-4, Baichuan2, LLaMa2. We thank the authors for their awesome open-source contributions.
If you find this project useful for your research, please kindly cite our paper.
@article{xu2024vtg,
title={VTG-GPT: Tuning-Free Zero-Shot Video Temporal Grounding with GPT},
author={Xu, Yifang and Sun, Yunzhuo and Xie, Zien and Zhai, Benxiang and Du, Sidan},
journal={Applied Sciences},
volume={14},
number={5},
pages={1894},
year={2024},
publisher={MDPI}
}