Skip to content

The official implementation of the Improving Interpretable Embeddings for Ad-hoc Video Search with Generative Captions and Multi-word Concept Bank model published in ICMR2024.

License

Notifications You must be signed in to change notification settings

nikkiwoo-gh/Improved-ITV

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Improved-ITV

The official implementation of our paper Improving Interpretable Embeddings for Ad-hoc Video Search with Generative Captions and Multi-word Concept Bank accepted in ICMR2024.

Environment

We used Anaconda to setup a workspace with PyTorch 1.8. Run the following script to install the required packages.

conda create -n IITV python==3.8 -y
conda activate IITV
git clone https://github.com/nikkiwoo-gh/Improved-ITV.git
cd Improved-ITV
pip install -r requirements.txt

Stanford coreNLP server for concept bank construction

./do_install_StanfordCoreNLIP.sh

Downloads

Pretraining Dataset

WebVid-genCap7M dataset

Concept bank for tgif-msrvtt10k-VATEX

concept_phrase.zip

Video-level concept annoation for tgif-msrvtt10k-VATEX

tgif-msrvtt10k-VATEX-videl_level_concept annotation

Model Checkpoints

Improved_ITV model pretrained on_WebVid-genCap7M

Improved_ITV model finetuned on tgif-msrvtt10k-VATEX

Features for Finetune

training and valiation sets please refer to AVS_data

testing sets please refer to AVS_feature_data

Usages

1. build bag of word vocabulary and concept bank

./do_get_vocab_and_concept.sh $collection

e.g.,

./do_get_vocab_and_concept.sh tgif-msrvtt10k-VATEX 

or download from concept_phrase.zip, and unzip to the folder $rootpath/tgif-msrvtt10k-VATEX/TextData/

2. prepare concept annotation

build up video-level concept annotation (script to be released), or download from here

3. (Optional) pretrain the Improved ITV model

./do_pretrain.sh

4. train the Improved ITV model

4.1 train from pre-trained checkpoint

./do_train_from_pretrain.sh

4.2 train without pre-training

./do_train.sh

5. Inference on TRECVid datasets

./do_prediction_iacc.3.sh
./do_prediction_v3c1.sh
./do_prediction_v3c2.sh

6. Evalution

Remember to set the score_file correctly to your own path.

cd tv-avs-eval/
do_eval_iacc.3.sh
do_eval_v3c1.sh
do_eval_v3c2.sh

Citation

@inproceedings{ICMR2024_WU_improvedITV,
author = {Wu, Jiaxin and Ngo, Chong-Wah and Chan, Wing-Kwong},
title = {Improving Interpretable Embeddings for Ad-hoc Video Search with Generative Captions and Multi-word Concept Bank},
year = {2024},
booktitle = {The Annual ACM International Conference on Multimedia Retrieval},
pages = {1-10},
}

Contact

jiaxin.wu@my.cityu.edu.hk

About

The official implementation of the Improving Interpretable Embeddings for Ad-hoc Video Search with Generative Captions and Multi-word Concept Bank model published in ICMR2024.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published