The official implementation of our paper Improving Interpretable Embeddings for Ad-hoc Video Search with Generative Captions and Multi-word Concept Bank accepted in ICMR2024.
We used Anaconda to setup a workspace with PyTorch 1.8. Run the following script to install the required packages.
conda create -n IITV python==3.8 -y
conda activate IITV
git clone https://github.com/nikkiwoo-gh/Improved-ITV.git
cd Improved-ITV
pip install -r requirements.txt
./do_install_StanfordCoreNLIP.sh
tgif-msrvtt10k-VATEX-videl_level_concept annotation
Improved_ITV model pretrained on_WebVid-genCap7M
Improved_ITV model finetuned on tgif-msrvtt10k-VATEX
training and valiation sets please refer to AVS_data
testing sets please refer to AVS_feature_data
./do_get_vocab_and_concept.sh $collection
e.g.,
./do_get_vocab_and_concept.sh tgif-msrvtt10k-VATEX
or download from concept_phrase.zip, and unzip to the folder $rootpath/tgif-msrvtt10k-VATEX/TextData/
build up video-level concept annotation (script to be released), or download from here
./do_pretrain.sh
./do_train_from_pretrain.sh
./do_train.sh
./do_prediction_iacc.3.sh
./do_prediction_v3c1.sh
./do_prediction_v3c2.sh
Remember to set the score_file correctly to your own path.
cd tv-avs-eval/
do_eval_iacc.3.sh
do_eval_v3c1.sh
do_eval_v3c2.sh
@inproceedings{ICMR2024_WU_improvedITV,
author = {Wu, Jiaxin and Ngo, Chong-Wah and Chan, Wing-Kwong},
title = {Improving Interpretable Embeddings for Ad-hoc Video Search with Generative Captions and Multi-word Concept Bank},
year = {2024},
booktitle = {The Annual ACM International Conference on Multimedia Retrieval},
pages = {1-10},
}