APT: Attention Prompt Tuning

A Parameter-Efficient Adaptation of Pre-Trained Models for Action Recognition ...

Wele Gedara Chaminda Bandara, Vishal M Patel
Johns Hopkins University

Accepted at FG'24

Paper (on ArXiv)

Overview of Proposed Method

Comparison of our Attention Prompt Tuning (APT) for videos action classification with other existing tuning methods: linear probing, adapter tuning, visual prompt tuning (VPT), and full fine-tuning.

Attention Prompt Tuning (APT) injects learnable prompts directly into the MHA unlike VPT.

Getting Started

Step 1: Conda Environment

Setup the virtual conda environment using the environment.yml:

conda env create -f environment.yml

Then activate the conda environment:

conda activate apt

Step 2: Download the VideoMAE Pre-trained Models:

We use VideoMAE pretrianed on Kinetics-400 dataset for our experiments.

The pre-trained models for ViT-Small and ViT-Base backbones can be downloaded from below links:

Method	Extra Data	Backbone	Epoch	#Frame	Pre-train
VideoMAE	no	ViT-S	1600	16x5x3	checkpoint
VideoMAE	no	ViT-B	1600	16x5x3	checkpoint

If you need other pre-trained models please refer MODEL_ZOO.md.

Step 3: Download the datasets

We conduct experiments on three action recognition datasets: 1) UCF101 2) HMDB51 3) Something-Something-V2.

Please refer DATASETS.md for access to those links and pre-processing steps.

Step 4: Attention Prompt Tuning

We provide example scripts to run the attention prompt tuning on UCF101, HMDB51, and SSv2 datasets in scripts/ folder.

Inside scripts/ you can find two folders which corresponds to APT finetuning with ViT-Small and ViT-Base architectures.

To fine-tune with APT you just need to execute finetune.sh file -- which will launch the job with distributed training by

For example, to fine-tune ViT-Base on SSv2 with APT, you may run:

sh scripts/ssv2/vit_base/finetune.sh

The finetune.sh looks like this:

# APT on SSv2
OUTPUT_DIR='experiments/APT/SSV2/ssv2_videomae_pretrain_base_patch16_224_frame_16x2_tube_mask_ratio_0.9_e2400/adam_mome9e-1_wd1e-5_lr5se-2_pl2_ps0_pe11_drop10'
DATA_PATH='datasets/ss2/list_ssv2/'
MODEL_PATH='experiments/pretrain/ssv2_videomae_pretrain_base_patch16_224_frame_16x2_tube_mask_ratio_0.9_e2400/checkpoint.pth'

NCCL_P2P_DISABLE=1 OMP_NUM_THREADS=1 CUDA_VISIBLE_DEVICES=0,1,3,4,5,6,7,8 python -m torch.distributed.launch --nproc_per_node=8 \
    run_class_apt.py \
    --model vit_base_patch16_224 \
    --transfer_type prompt \
    --prompt_start 0 \
    --prompt_end 11 \
    --prompt_num_tokens 2 \
    --prompt_dropout 0.1 \
    --data_set SSV2 \
    --nb_classes 174 \
    --data_path ${DATA_PATH} \
    --finetune ${MODEL_PATH} \
    --log_dir ${OUTPUT_DIR} \
    --output_dir ${OUTPUT_DIR} \
    --batch_size 8 \
    --batch_size_val 8 \
    --num_sample 2 \
    --input_size 224 \
    --short_side_size 224 \
    --save_ckpt_freq 10 \
    --num_frames 16 \
    --opt adamw \
    --lr 0.05 \
    --weight_decay 0.00001 \
    --epochs 100 \
    --warmup_epochs 10 \
    --test_num_segment 2 \
    --test_num_crop 3 \
    --dist_eval \
    --pin_mem \
    --enable_deepspeed \
    --prompt_reparam \
    --is_aa \
    --aa rand-m4-n2-mstd0.2-inc1

Here,

OUTPUT_DIR: place where you wants to save the results (i.e., logs and checkpoints)
DATA_PATH: path to where the dataset is stored
MODEL_PATH: path to the downloaded videomae pre-trained model
specifiy thich gpus (gpu ids) you wants to use for finetuning in CUDA_VISIBLE_DEVICES=...
nproc_per_node is the number of gpus using for fine-tuning
model is the vit-base (vit_base_patch16_224) or vit-small (vit_small_patch16_224)
transfer_type specifies which finetuning method to use. 'random' means random initialization, 'end2end' means full end-to-end fine tuning, 'prompt' means APT (ours), 'linear' means linear probing
prompt_start refers to starting trasnformer block where you add attention prompts. 0 means you start adding learninable prompts from 1st transformer block in vit
prompt_end refers to ending trasformer block where you stop adding attention prompts. vit-base / vit-small has 12 transformer blocks. hence 11 here means you add prompts until last trasnformer block
data_set specifies the dataset
- all the other parameters are hyperparamters related to apt fine-tuning.

✏️ Citation

If you think this project is helpful, please feel free to leave a star and cite our paper:

@misc{bandara2024attention,
      title={Attention Prompt Tuning: Parameter-efficient Adaptation of Pre-trained Models for Spatiotemporal Modeling}, 
      author={Wele Gedara Chaminda Bandara and Vishal M. Patel},
      year={2024},
      eprint={2403.06978},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

✏️ Disclaimer

This repocitory is built on top of VideoMAE: https://github.com/MCG-NJU/VideoMAE codebase and we approcite the authors of VideoMAE for making their codebase publically available.

Name	Name	Last commit message	Last commit date
Latest commit wgcban Update README.md Mar 12, 2024 be15978 · Mar 12, 2024 History 44 Commits
figs	figs	Add files via upload	Mar 12, 2024
scripts	scripts	deleted recog dataset	May 24, 2023
DATASET.md	DATASET.md	initial commit from server	May 23, 2023
FINETUNE.md	FINETUNE.md	initial commit from server	May 23, 2023
INSTALL.md	INSTALL.md	Update INSTALL.md	Mar 7, 2024
LICENSE	LICENSE	initial commit from server	May 23, 2023
MODEL_ZOO.md	MODEL_ZOO.md	Update MODEL_ZOO.md	Mar 7, 2024
NOTICE.md	NOTICE.md	initial commit from server	May 23, 2023
PRETRAIN.md	PRETRAIN.md	initial commit from server	May 23, 2023
README.md	README.md	Update README.md	Mar 12, 2024
datasets.py	datasets.py	initial commit from server	May 23, 2023
engine_for_finetuning.py	engine_for_finetuning.py	initial commit from server	May 23, 2023
engine_for_pretraining.py	engine_for_pretraining.py	initial commit from server	May 23, 2023
environment.yml	environment.yml	update	May 23, 2023
final_args.json	final_args.json	initial commit from server	May 23, 2023
functional.py	functional.py	initial commit from server	May 23, 2023
gitignore	gitignore	initial commit from server	May 23, 2023
kinetics.py	kinetics.py	initial commit from server	May 23, 2023
masking_generator.py	masking_generator.py	initial commit from server	May 23, 2023
mixup.py	mixup.py	initial commit from server	May 23, 2023
modeling_VPT.py	modeling_VPT.py	initial commit from server	May 23, 2023
modeling_finetune.py	modeling_finetune.py	initial commit from server	May 23, 2023
modeling_pretrain.py	modeling_pretrain.py	initial commit from server	May 23, 2023
optim_factory.py	optim_factory.py	initial commit from server	May 23, 2023
rand_augment.py	rand_augment.py	initial commit from server	May 23, 2023
random_erasing.py	random_erasing.py	initial commit from server	May 23, 2023
run_class_apt.py	run_class_apt.py	initial commit from server	May 23, 2023
run_class_finetuning.py	run_class_finetuning.py	initial commit from server	May 23, 2023
run_mae_pretraining.py	run_mae_pretraining.py	initial commit from server	May 23, 2023
run_videomae_vis.py	run_videomae_vis.py	initial commit from server	May 23, 2023
ssv2.py	ssv2.py	initial commit from server	May 23, 2023
transforms.py	transforms.py	initial commit from server	May 23, 2023
utils.py	utils.py	initial commit from server	May 23, 2023
video_transforms.py	video_transforms.py	initial commit from server	May 23, 2023
vis.sh	vis.sh	initial commit from server	May 23, 2023
volume_transforms.py	volume_transforms.py	initial commit from server	May 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

APT: Attention Prompt Tuning

Overview of Proposed Method

Getting Started

Step 1: Conda Environment

Step 2: Download the VideoMAE Pre-trained Models:

Step 3: Download the datasets

Step 4: Attention Prompt Tuning

✏️ Citation

✏️ Disclaimer

About

Releases

Packages

Languages

License

wgcban/apt

Folders and files

Latest commit

History

Repository files navigation

APT: Attention Prompt Tuning

Overview of Proposed Method

Getting Started

Step 1: Conda Environment

Step 2: Download the VideoMAE Pre-trained Models:

Step 3: Download the datasets

Step 4: Attention Prompt Tuning

✏️ Citation

✏️ Disclaimer

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages