Minho Park, Sunghyun Park, Jooyeol Yun, and Jaegul Choo
Korea Advanced Institute of Science and Technology (KAIST)
TLDR; To effectively fine-tune CLIP with the generated datasets, robust regularization techniques are essential, including weight-space ensembling and variance-covariance regularization.
(a) Observed significant domain gap and (b) Performance degradation due to the domain gap.
# python=3.8, torch==2.0.0, etc.
conda env create --file environments.yaml
conda activate regft
cd 1_generate_datasets
python generate_datasets.py --ckpt="stabilityai/stable-diffusion-2-1-base" --dataset="imagenet" --prompt_style_file="prompt_styles.json" --output_dir="/path/to/save/generated_imagenet"
- Training-time regularization (Variance-Covariance Regularization)
- Post-training regularization (Weight-space ensemble)
cd 2_finetune_classifier
export PYTHONPATH="$PYTHONPATH:${PWD}"
vc_reg1=0.16
vc_reg2=0.02
model="ViT-B/16"
eval_dataset="ImageNet"
train_dataset="ImageNetSD"
gpt_prompt_file="gpt_file/imagenet_prompt.json"
save_subdir="/path/to/save"
python train.py --train-dataset=${train_dataset} --model=${model} --eval-datasets=${eval_dataset} --alpha 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 --gpt_prompt_file gpt_file/imagenet_prompt.json --vc_reg ${vc_reg1} ${vc_reg2} --save=${save_subdir}
- Replace the image encoder in the CaFo architecture with the fine-tuned version.
This repo benefits from diffusers, CLIP, WiSE-FT, and CaFo. Thanks for their wonderful works.
@article{park2024regularized,
title={Regularized Training with Generated Datasets for Name-Only Transfer of Vision-Language Models},
author={Park, Minho and Park, Sunghyun and Yun, Jooyeol and Choo, Jaegul},
journal={arXiv preprint arXiv:2406.05432},
year={2024}
}