Skip to content

Commit

Permalink
project page
Browse files Browse the repository at this point in the history
  • Loading branch information
thuanz123 committed Jul 11, 2024
1 parent 15509e8 commit ddb0fd9
Show file tree
Hide file tree
Showing 98 changed files with 1,893 additions and 1,535 deletions.
173 changes: 0 additions & 173 deletions LICENSE.md

This file was deleted.

128 changes: 1 addition & 127 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,127 +1 @@
##### Table of contents
1. [Environment Setup](#environment-setup)
2. [How to Run](#how-to-run)
3. [Acknowledgments](#acknowledgments)
4. [Note](#note)
5. [Contacts](#contacts)

# Official PyTorch code of "SwiftBrush: One-Step Text-to-Image Diffusion Model with Variational Score Distillation" (CVPR'24)
<a href="https://arxiv.org/abs/2312.05239"><img src="https://img.shields.io/badge/paper-2312.05239-red?style=for-the-badge"></a>
<a href="https://thuanz123.github.io/swiftbrush"><img src="https://img.shields.io/badge/website-swiftbrush-blue?style=for-the-badge"></a>
<a href="https://swiftbrushv2.github.io"><img src="https://img.shields.io/badge/website-swiftbrush v2-green?style=for-the-badge"></a>
<div align="center">
<a href="https://thuanz123.github.io" target="_blank">Thuan&nbsp;Hoang&nbsp;Nguyen</a> &emsp;
<a href="https://scholar.google.com/citations?user=FYZ5ODQAAAAJ&hl=en" target="_blank">Anh&nbsp;Tran</a>
<br> <br>


<a href="https://www.vinai.io/">VinAI Research</a>
</div>
<br>

<div align="center">
<img width="1000" alt="teaser" src="assets/teaser2-1.png"/>
</div>

> **Abstract**: Despite their ability to generate high-resolution and diverse images from text prompts, text-to-image diffusion models often suffer from slow iterative sampling processes. Model distillation is one of the most effective directions to accelerate these models. However, previous distillation methods fail to retain the generation quality while requiring a significant amount of images for training, either from real data or synthetically generated by the teacher model. In response to this limitation, we present a novel image-free distillation scheme named **SwiftBrush**. Drawing inspiration from text-to-3D synthesis, in which a 3D neural radiance field that aligns with the input prompt can be obtained from a 2D text-to-image diffusion prior via a specialized loss without the use of any 3D data ground-truth, our approach re-purposes that same loss for distilling a pretrained multi-step text-to-image model to a student network that can generate high-fidelity images with just a single inference step. In spite of its simplicity, our model stands as one of the first one-step text-to-image generators that can produce images of comparable quality to Stable Diffusion without reliance on any training image data. Remarkably, SwiftBrush achieves an FID score of **16.67** and a CLIP score of **0.29** on the COCO-30K benchmark, achieving competitive results or even substantially surpassing existing state-of-the-art distillation techniques.

**TLDR**: An image-free distillation method that transform multi-step text-to-image diffusion models into one-step generators.

Details of algorithms and experimental results can be found in [our following paper](https://arxiv.org/abs/2312.05239):
```bibtex
@InProceedings{nguyen2024swiftbrush,
title={SwiftBrush: One-Step Text-to-Image Diffusion Model with Variational Score Distillation},
author={Thuan Hoang Nguyen and Anh Tran},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2024}
}
```
**Please CITE** our paper whenever this repository is used to help produce published results or incorporated into other software.

## Environment Setup

Before running the scripts, make sure to install the library's training dependencies:

Navigate to the swiftbrush folder and setup the conda environment
```bash
cd swiftbrush
conda install -n swiftbrush python=3.10
```

Then activate the conda environment and install all dependencies
```bash
conda activate swiftbrush
pip install -r requirements.txt
```

(Optional) install xformers using the guide from [here](https://github.com/facebookresearch/xformers#installing-xformers)

And finally initialize an [🤗Accelerate](https://github.com/huggingface/accelerate/) environment with:

```bash
accelerate config
```

Or for a default accelerate configuration without answering questions about your environment

```bash
accelerate config default
```

## How to Run

### Training
First prepare your own `.txt` file containing all the prompts for training and pre-generate the text embeddings to save training time. Running the below command will create a text embeddings folder with the same name as the `.txt` file

```bash
python prepare.py \
--pretrained_model_name_or_path "stabilityai/stable-diffusion-2-1-base" \
--prompt_list "/path/to/txt_file" \
--batch_size 32 \
--num_processes 16
```

To train a SwiftBrush model, simply run:

```bash
accelerate launch train_swiftbrush.py \
--pretrained_model_name_or_path "stabilityai/stable-diffusion-2-1-base" \
--train_data_dir "/path/to/text_embeddings_folder" \
--resolution 512 \
--use_ema \
--validation_prompts "A racoon wearing formal clothes, wearing a tophat. Oil painting in the style of Rembrandt" "a zoomed out DSLR photo of a hippo biting through a watermelon" "a lanky tall alien on a romantic date at italian restaurant with a smiling woman, nice restaurant, photography, bokeh" \
--validation_steps 500 \
--train_batch_size 16 \
--gradient_accumulation_steps 1 \
--set_grads_to_none \
--guidance_scale 4.5 \
--learning_rate 1.e-06 \
--learning_rate_lora 1.e-03 \
--lr_scheduler "constant" --lr_warmup_steps 0 \
--lora_rank 64 --lora_alpha 108 \
--num_train_epochs 3 \
--checkpointing_steps 10000
```

For low-memory GPU, you can add `--enable_xformers_memory_efficient_attention` (xformers must be installed) and/or `--gradient_checkpoint` arguments to the above command

### Inference

To generate an image, simply run:

```bash
python infer.py \
--pretrained_model_name_or_path "thuanz123/swiftbrush" \
--prompt "A DSLR photo of a shiba on the beach" \
--seed 0
```

## Acknowledgments
We give thanks to Uy Dieu Tran for early discussions as well as providing many helpful comments and suggestions throughout the project. Special thanks to Trung Tuan Dao for valuable feedback and support. Last but not least, we thank Zhengyi Wang, Cheng Lu, Yikai Wang, Fan Bao, Chongxuan Li, Hang Su and Jun Zhu for the work of ProlificDreamer as well as Huggingface team for the diffusers framework.

## Note

We have also been developing a superior version, **SwiftBrush v2**, and a brief introduction of its is available [here](https://swiftbrushv2.github.io/).

## Contacts
If you have any questions, please drop an email to _v.thuannh5@vinai.io_ or open an issue in this repository.
# swiftbrush
6 changes: 6 additions & 0 deletions assets/bibtex.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
@inproceedings{thuan2024swiftbrush,
title={SwiftBrush: One-Step Text-to-Image Diffusion Model with Variational Score Distillation},
author={Thuan Hoang Nguyen, Anh Tran},
year={2024},
booktitle={IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
}
Binary file added assets/brush.ico
Binary file not shown.
Binary file added assets/preview.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
25 changes: 25 additions & 0 deletions assets/prompts.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
Fruit in a jar filled with liquid sitting on a wooden table.
A beautiful blue and pink sky overlooking the beach.
A minimalistic fisherman in geometric design with isometric mountains and forest in the background and flying fish and a moon on top.
Disney Concept Artists created a fugue with blunt borders following the rule of thirds.
A dresser in a room that is painted bright yellow.
A landscape with a building resembling the Iphone 4 front camera.
A close-up portrait of Rapunzel with a smile.
Concept art of a highly detailed landscape, centered and utilizing rule of thirds, with dynamic lighting for a cinematic effect.
A fox wearing a yellow dress.
There is traffic on a busy city street.
A monkey wearing a jacket.
Goku in a dynamic and cool pose on a manga page, drawn in the style of Hirohiko Araki.
A serene meadow with a tree, river, bridge, and mountains in the background under a slightly overcast sunrise sky.
Two cats sitting together in an empty bathtub.
A golden retriever representing god.
A painting of a girl wearing uniform in the city
A vase with a flower growing very well
A realistic anime painting of a cosmic woman wearing clothes made of universes with glowing red eyes.
The numbers and hands on the clock are gold.
A portrait of Rafael Nadal in Van Gogh's style.
A dog with a plate of food on the ground
A pencil sketch of an old man by Milt Kahl.
A cinematic portrait of Walt Whitman painted in oil on canvas or gouache with intricate details and desaturated colors.
A bicycle covered with greens and beans.
A wooden skate with a toy elephant on top of it
Binary file removed assets/teaser2-1.png
Binary file not shown.
7 changes: 7 additions & 0 deletions bootstrap.bundle.min.js

Large diffs are not rendered by default.

7 changes: 7 additions & 0 deletions bootstrap.min.css

Large diffs are not rendered by default.

13 changes: 13 additions & 0 deletions chart.min.js

Large diffs are not rendered by default.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit ddb0fd9

Please sign in to comment.