Official Pytorch implementation of the WACV 2024 paper Multimodality-guided Image Style Transfer using Cross-modal GAN Inversion.
[Paper] [Project Page]
Fisrt, clone the repository recursively:
git clone --recursive https://github.com/hywang66/mmist.git
Then, create an conda environment (optional) and install the dependencies:
pip install -r requirements.txt
Next, since we use StyleGAN3, you need to install additional tools including CUDA toolkit 11.1 or later and GCC 7 or later. See more details in the Requirements section of StyleGAN3(folked). Note that the python dependencies are already installed in the previous step.
Next, download the wikiart pre-trained StyleGAN3 model from here and put it under ./stylegan3/models/
. Its path should be ./models/wikiart-1024-stylegan3-t-17.2Mimg.pkl
Next, download the pretrained AdaAttN model from here and copy it to the checkpoints
folder in the AdaAttN repo, and unzip it. See more detailes in the forked AdaAttN repo. Specifically:
cd AdaAttN
mkdir checkpoints
mv [Download Directory]/AdaAttN_model.zip checkpoints/
unzip checkpoints/AdaAttN_model.zip
Now, you are ready to go!
We provide a simple example to execute mulimodality-guided image style transfer using one style text and one style image.
If you completely follow the installation steps, simply run:
bash example.sh
The results will be saved in ./outputs/exp_mmist/stylized_imgs
.
In the example.sh
script, we first set the used method repo paths and the pre-trained model path. Then, we set the input content images paths and the input style paths. Finally, we stylize the content images in 2 steps:
- First, we generate style representations based on style text and/or style image inputs using
gen_style_reps.py
. - Next, we stylize the content images using
apply_style_reps.py
.
Note that these two steps can be executed separately, i.e., you can stylize the different content images multiple times using the same pre-generated style representations by running apply_style_reps.py
multiple times.
In gen_style_reps.py
, you can pass arbitrary number of style text and style image inputs. However, you need to ensure that the number of style inputs is the same as the number of style weights. In other words, you need to ensure that the number of argument passed to --sty_text
is the same as the number of argument passed to --alpha_text
; and the number of argument passed to --sty_img
is the same as the number of argument passed to --alpha_img
.
As a general style transfer method, MMIST can also be used to transfer the style expressed in a single modality. This can be achieved by passing only one style input (either style text or style image) to gen_style_reps.py
. Please see the detailed example in example_TIST.sh
If you find our work useful in your research, please cite:
@inproceedings{wang2024multimodality,
title={Multimodality-guided Image Style Transfer using Cross-modal GAN Inversion},
author={Wang, Hanyu and Wu, Pengxiang and Dela Rosa, Kevin and Wang, Chen and Shrivastava, Abhinav},
booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
pages={4976--4985},
year={2024}
}