Skip to content

Official PyTorch implementation of "Regularized Training with Generated Datasets for Name-Only Transfer of Vision-Language Models"

Notifications You must be signed in to change notification settings

pmh9960/regft-for-gen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Regularized Training with Generated Datasets for Name-Only Transfer of Vision-Language Models

arXiv

Minho Park, Sunghyun Park, Jooyeol Yun, and Jaegul Choo
Korea Advanced Institute of Science and Technology (KAIST)

TLDR; To effectively fine-tune CLIP with the generated datasets, robust regularization techniques are essential, including weight-space ensembling and variance-covariance regularization.

intro

(a) Observed significant domain gap and (b) Performance degradation due to the domain gap.

Installation

# python=3.8, torch==2.0.0, etc.
conda env create --file environments.yaml
conda activate regft

Generating classification dataset

cd 1_generate_datasets
python generate_datasets.py --ckpt="stabilityai/stable-diffusion-2-1-base" --dataset="imagenet" --prompt_style_file="prompt_styles.json" --output_dir="/path/to/save/generated_imagenet"

Finetuning CLIP with the generated dataset

  • Training-time regularization (Variance-Covariance Regularization)
  • Post-training regularization (Weight-space ensemble)
cd 2_finetune_classifier
export PYTHONPATH="$PYTHONPATH:${PWD}"

vc_reg1=0.16
vc_reg2=0.02
model="ViT-B/16"
eval_dataset="ImageNet"
train_dataset="ImageNetSD"
gpt_prompt_file="gpt_file/imagenet_prompt.json"
save_subdir="/path/to/save"

python train.py --train-dataset=${train_dataset} --model=${model} --eval-datasets=${eval_dataset} --alpha 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 --gpt_prompt_file gpt_file/imagenet_prompt.json --vc_reg ${vc_reg1} ${vc_reg2} --save=${save_subdir}

(Optional) Integration with the adapter method

  • Replace the image encoder in the CaFo architecture with the fine-tuned version.

Acknowledgement

This repo benefits from diffusers, CLIP, WiSE-FT, and CaFo. Thanks for their wonderful works.

Citation

@article{park2024regularized,
  title={Regularized Training with Generated Datasets for Name-Only Transfer of Vision-Language Models},
  author={Park, Minho and Park, Sunghyun and Yun, Jooyeol and Choo, Jaegul},
  journal={arXiv preprint arXiv:2406.05432},
  year={2024}
}

About

Official PyTorch implementation of "Regularized Training with Generated Datasets for Name-Only Transfer of Vision-Language Models"

Resources

Stars

Watchers

Forks

Languages