-
Notifications
You must be signed in to change notification settings - Fork 7
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
98 changed files
with
1,893 additions
and
1,535 deletions.
There are no files selected for viewing
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,127 +1 @@ | ||
##### Table of contents | ||
1. [Environment Setup](#environment-setup) | ||
2. [How to Run](#how-to-run) | ||
3. [Acknowledgments](#acknowledgments) | ||
4. [Note](#note) | ||
5. [Contacts](#contacts) | ||
|
||
# Official PyTorch code of "SwiftBrush: One-Step Text-to-Image Diffusion Model with Variational Score Distillation" (CVPR'24) | ||
<a href="https://arxiv.org/abs/2312.05239"><img src="https://img.shields.io/badge/paper-2312.05239-red?style=for-the-badge"></a> | ||
<a href="https://thuanz123.github.io/swiftbrush"><img src="https://img.shields.io/badge/website-swiftbrush-blue?style=for-the-badge"></a> | ||
<a href="https://swiftbrushv2.github.io"><img src="https://img.shields.io/badge/website-swiftbrush v2-green?style=for-the-badge"></a> | ||
<div align="center"> | ||
<a href="https://thuanz123.github.io" target="_blank">Thuan Hoang Nguyen</a>   | ||
<a href="https://scholar.google.com/citations?user=FYZ5ODQAAAAJ&hl=en" target="_blank">Anh Tran</a> | ||
<br> <br> | ||
|
||
|
||
<a href="https://www.vinai.io/">VinAI Research</a> | ||
</div> | ||
<br> | ||
|
||
<div align="center"> | ||
<img width="1000" alt="teaser" src="assets/teaser2-1.png"/> | ||
</div> | ||
|
||
> **Abstract**: Despite their ability to generate high-resolution and diverse images from text prompts, text-to-image diffusion models often suffer from slow iterative sampling processes. Model distillation is one of the most effective directions to accelerate these models. However, previous distillation methods fail to retain the generation quality while requiring a significant amount of images for training, either from real data or synthetically generated by the teacher model. In response to this limitation, we present a novel image-free distillation scheme named **SwiftBrush**. Drawing inspiration from text-to-3D synthesis, in which a 3D neural radiance field that aligns with the input prompt can be obtained from a 2D text-to-image diffusion prior via a specialized loss without the use of any 3D data ground-truth, our approach re-purposes that same loss for distilling a pretrained multi-step text-to-image model to a student network that can generate high-fidelity images with just a single inference step. In spite of its simplicity, our model stands as one of the first one-step text-to-image generators that can produce images of comparable quality to Stable Diffusion without reliance on any training image data. Remarkably, SwiftBrush achieves an FID score of **16.67** and a CLIP score of **0.29** on the COCO-30K benchmark, achieving competitive results or even substantially surpassing existing state-of-the-art distillation techniques. | ||
|
||
**TLDR**: An image-free distillation method that transform multi-step text-to-image diffusion models into one-step generators. | ||
|
||
Details of algorithms and experimental results can be found in [our following paper](https://arxiv.org/abs/2312.05239): | ||
```bibtex | ||
@InProceedings{nguyen2024swiftbrush, | ||
title={SwiftBrush: One-Step Text-to-Image Diffusion Model with Variational Score Distillation}, | ||
author={Thuan Hoang Nguyen and Anh Tran}, | ||
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, | ||
year={2024} | ||
} | ||
``` | ||
**Please CITE** our paper whenever this repository is used to help produce published results or incorporated into other software. | ||
|
||
## Environment Setup | ||
|
||
Before running the scripts, make sure to install the library's training dependencies: | ||
|
||
Navigate to the swiftbrush folder and setup the conda environment | ||
```bash | ||
cd swiftbrush | ||
conda install -n swiftbrush python=3.10 | ||
``` | ||
|
||
Then activate the conda environment and install all dependencies | ||
```bash | ||
conda activate swiftbrush | ||
pip install -r requirements.txt | ||
``` | ||
|
||
(Optional) install xformers using the guide from [here](https://github.com/facebookresearch/xformers#installing-xformers) | ||
|
||
And finally initialize an [🤗Accelerate](https://github.com/huggingface/accelerate/) environment with: | ||
|
||
```bash | ||
accelerate config | ||
``` | ||
|
||
Or for a default accelerate configuration without answering questions about your environment | ||
|
||
```bash | ||
accelerate config default | ||
``` | ||
|
||
## How to Run | ||
|
||
### Training | ||
First prepare your own `.txt` file containing all the prompts for training and pre-generate the text embeddings to save training time. Running the below command will create a text embeddings folder with the same name as the `.txt` file | ||
|
||
```bash | ||
python prepare.py \ | ||
--pretrained_model_name_or_path "stabilityai/stable-diffusion-2-1-base" \ | ||
--prompt_list "/path/to/txt_file" \ | ||
--batch_size 32 \ | ||
--num_processes 16 | ||
``` | ||
|
||
To train a SwiftBrush model, simply run: | ||
|
||
```bash | ||
accelerate launch train_swiftbrush.py \ | ||
--pretrained_model_name_or_path "stabilityai/stable-diffusion-2-1-base" \ | ||
--train_data_dir "/path/to/text_embeddings_folder" \ | ||
--resolution 512 \ | ||
--use_ema \ | ||
--validation_prompts "A racoon wearing formal clothes, wearing a tophat. Oil painting in the style of Rembrandt" "a zoomed out DSLR photo of a hippo biting through a watermelon" "a lanky tall alien on a romantic date at italian restaurant with a smiling woman, nice restaurant, photography, bokeh" \ | ||
--validation_steps 500 \ | ||
--train_batch_size 16 \ | ||
--gradient_accumulation_steps 1 \ | ||
--set_grads_to_none \ | ||
--guidance_scale 4.5 \ | ||
--learning_rate 1.e-06 \ | ||
--learning_rate_lora 1.e-03 \ | ||
--lr_scheduler "constant" --lr_warmup_steps 0 \ | ||
--lora_rank 64 --lora_alpha 108 \ | ||
--num_train_epochs 3 \ | ||
--checkpointing_steps 10000 | ||
``` | ||
|
||
For low-memory GPU, you can add `--enable_xformers_memory_efficient_attention` (xformers must be installed) and/or `--gradient_checkpoint` arguments to the above command | ||
|
||
### Inference | ||
|
||
To generate an image, simply run: | ||
|
||
```bash | ||
python infer.py \ | ||
--pretrained_model_name_or_path "thuanz123/swiftbrush" \ | ||
--prompt "A DSLR photo of a shiba on the beach" \ | ||
--seed 0 | ||
``` | ||
|
||
## Acknowledgments | ||
We give thanks to Uy Dieu Tran for early discussions as well as providing many helpful comments and suggestions throughout the project. Special thanks to Trung Tuan Dao for valuable feedback and support. Last but not least, we thank Zhengyi Wang, Cheng Lu, Yikai Wang, Fan Bao, Chongxuan Li, Hang Su and Jun Zhu for the work of ProlificDreamer as well as Huggingface team for the diffusers framework. | ||
|
||
## Note | ||
|
||
We have also been developing a superior version, **SwiftBrush v2**, and a brief introduction of its is available [here](https://swiftbrushv2.github.io/). | ||
|
||
## Contacts | ||
If you have any questions, please drop an email to _v.thuannh5@vinai.io_ or open an issue in this repository. | ||
# swiftbrush |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
@inproceedings{thuan2024swiftbrush, | ||
title={SwiftBrush: One-Step Text-to-Image Diffusion Model with Variational Score Distillation}, | ||
author={Thuan Hoang Nguyen, Anh Tran}, | ||
year={2024}, | ||
booktitle={IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, | ||
} |
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
Fruit in a jar filled with liquid sitting on a wooden table. | ||
A beautiful blue and pink sky overlooking the beach. | ||
A minimalistic fisherman in geometric design with isometric mountains and forest in the background and flying fish and a moon on top. | ||
Disney Concept Artists created a fugue with blunt borders following the rule of thirds. | ||
A dresser in a room that is painted bright yellow. | ||
A landscape with a building resembling the Iphone 4 front camera. | ||
A close-up portrait of Rapunzel with a smile. | ||
Concept art of a highly detailed landscape, centered and utilizing rule of thirds, with dynamic lighting for a cinematic effect. | ||
A fox wearing a yellow dress. | ||
There is traffic on a busy city street. | ||
A monkey wearing a jacket. | ||
Goku in a dynamic and cool pose on a manga page, drawn in the style of Hirohiko Araki. | ||
A serene meadow with a tree, river, bridge, and mountains in the background under a slightly overcast sunrise sky. | ||
Two cats sitting together in an empty bathtub. | ||
A golden retriever representing god. | ||
A painting of a girl wearing uniform in the city | ||
A vase with a flower growing very well | ||
A realistic anime painting of a cosmic woman wearing clothes made of universes with glowing red eyes. | ||
The numbers and hands on the clock are gold. | ||
A portrait of Rafael Nadal in Van Gogh's style. | ||
A dog with a plate of food on the ground | ||
A pencil sketch of an old man by Milt Kahl. | ||
A cinematic portrait of Walt Whitman painted in oil on canvas or gouache with intricate details and desaturated colors. | ||
A bicycle covered with greens and beans. | ||
A wooden skate with a toy elephant on top of it |
Binary file not shown.
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+21.6 KB
compositional/A DSLR photo of a cat drinking latte on top of a mountain.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+31.4 KB
compositional/A DSLR photo of a cat eating pizza on top of a mountain.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+29.5 KB
compositional/A DSLR photo of a owl drinking latte on top of a mountain.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+27.3 KB
compositional/A DSLR photo of a owl eating pizza on top of a mountain.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+20.6 KB
compositional/A DSLR photo of a panda drinking latte on top of a mountain.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+22 KB
compositional/A DSLR photo of a panda eating pizza on top of a mountain.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+31.3 KB
compositional/A DSLR photo of a raccoon drinking latte in a garden.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+19.6 KB
compositional/A DSLR photo of a raccoon drinking latte on a beach.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+27.5 KB
compositional/A DSLR photo of a raccoon drinking latte on top of a mountain.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added
BIN
+31.2 KB
compositional/A DSLR photo of a raccoon eating pizza on top of a mountain.jpg
Oops, something went wrong.
Binary file added
BIN
+25.6 KB
compositional/A DSLR photo of a shiba inu drinking latte in a garden.jpg
Oops, something went wrong.
Binary file added
BIN
+21.8 KB
compositional/A DSLR photo of a shiba inu drinking latte on a beach.jpg
Oops, something went wrong.
Binary file added
BIN
+23.6 KB
compositional/A DSLR photo of a shiba inu drinking latte on top of a mountain.jpg
Oops, something went wrong.
Binary file added
BIN
+31.3 KB
compositional/A DSLR photo of a shiba inu eating pizza in a garden.jpg
Oops, something went wrong.
Binary file added
BIN
+23.4 KB
compositional/A DSLR photo of a shiba inu eating pizza on a beach.jpg
Oops, something went wrong.
Binary file added
BIN
+22.6 KB
compositional/A DSLR photo of a shiba inu eating pizza on top of a mountain.jpg
Oops, something went wrong.
Binary file added
BIN
+26.7 KB
compositional/An oil painting of a cat drinking latte in a garden.jpg
Oops, something went wrong.
Oops, something went wrong.
Binary file added
BIN
+27.2 KB
compositional/An oil painting of a cat drinking latte on top of a mountain.jpg
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Binary file added
BIN
+25.7 KB
compositional/An oil painting of a cat eating pizza on top of a mountain.jpg
Oops, something went wrong.
Binary file added
BIN
+34.5 KB
compositional/An oil painting of a owl drinking latte in a garden.jpg
Oops, something went wrong.
Oops, something went wrong.
Binary file added
BIN
+29.8 KB
compositional/An oil painting of a owl drinking latte on top of a mountain.jpg
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Binary file added
BIN
+31.6 KB
compositional/An oil painting of a owl eating pizza on top of a mountain.jpg
Oops, something went wrong.
Binary file added
BIN
+24 KB
compositional/An oil painting of a panda drinking latte in a garden.jpg
Oops, something went wrong.
Binary file added
BIN
+22.3 KB
compositional/An oil painting of a panda drinking latte on a beach.jpg
Oops, something went wrong.
Binary file added
BIN
+21.5 KB
compositional/An oil painting of a panda drinking latte on top of a mountain.jpg
Oops, something went wrong.
Oops, something went wrong.
Oops, something went wrong.
Binary file added
BIN
+23.5 KB
compositional/An oil painting of a panda eating pizza on top of a mountain.jpg
Oops, something went wrong.
Binary file added
BIN
+36.8 KB
compositional/An oil painting of a raccoon drinking latte in a garden.jpg
Oops, something went wrong.
Binary file added
BIN
+31.4 KB
compositional/An oil painting of a raccoon drinking latte on a beach.jpg
Oops, something went wrong.
Binary file added
BIN
+23 KB
compositional/An oil painting of a raccoon drinking latte on top of a mountain.jpg
Oops, something went wrong.
Binary file added
BIN
+43.1 KB
compositional/An oil painting of a raccoon eating pizza in a garden.jpg
Oops, something went wrong.
Binary file added
BIN
+26.5 KB
compositional/An oil painting of a raccoon eating pizza on a beach.jpg
Oops, something went wrong.
Binary file added
BIN
+33.1 KB
compositional/An oil painting of a raccoon eating pizza on top of a mountain.jpg
Oops, something went wrong.
Binary file added
BIN
+32.6 KB
compositional/An oil painting of a shiba inu drinking latte in a garden.jpg
Oops, something went wrong.
Binary file added
BIN
+29.6 KB
compositional/An oil painting of a shiba inu drinking latte on a beach.jpg
Oops, something went wrong.
Binary file added
BIN
+27.3 KB
...sitional/An oil painting of a shiba inu drinking latte on top of a mountain.jpg
Oops, something went wrong.
Binary file added
BIN
+38.2 KB
compositional/An oil painting of a shiba inu eating pizza in a garden.jpg
Oops, something went wrong.
Binary file added
BIN
+31.2 KB
compositional/An oil painting of a shiba inu eating pizza on a beach.jpg
Oops, something went wrong.
Binary file added
BIN
+27.6 KB
compositional/An oil painting of a shiba inu eating pizza on top of a mountain.jpg
Oops, something went wrong.
Oops, something went wrong.