Skip to content

Commit

Permalink
Merge pull request #14 from Stability-AI/stablewaheed
Browse files Browse the repository at this point in the history
Added quantization example
  • Loading branch information
sanwal-stability authored Jan 16, 2025
2 parents e65b914 + 52d3342 commit 21bbb57
Show file tree
Hide file tree
Showing 17 changed files with 320 additions and 37 deletions.
26 changes: 26 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,29 @@ A collection of code samples for working with Stability AI's models. This repo w
![Image-to-Image](./images/screenshot_image_to_image.png)

![Inpainting](./images/screenshot_inpainting.png)

## Stable Diffusion 3.5 Inference Speeds
|Model|Inference Speed (seconds)|GPU|
|-----|-------------------------|---|
|SD3.5 M|4 s|NVIDIA H100 GPU with 80 GB of VRAM|
|[4-Bit Quanitized SD3.5 L](/sd35-text-to-image-quantized-gradio/)|18 s|NVIDIA H100 GPU with 80 GB of VRAM|
|SD3.5 L|7 s|NVIDIA H100 GPU with 80 GB of VRAM|

## Stable Diffusion 3.5 Prompt Tuning Using Guidance Scale
The [guidance_scale](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/text2img#diffusers.StableDiffusionPipeline.__call__.guidance_scale) parameter has a significant impact on image generation with Stable Diffusion 3.5 models:
> A higher guidance scale value encourages the model to generate images closely linked to the text prompt at the expense of lower image quality
Image quality can vary drastically based on the `guidance_scale` value. The below screenshots provide some recommended `guidance_scale` settings for three Stable Diffusion 3.5 models:
* [Stable Diffusion 3.5 Large](https://huggingface.co/stabilityai/stable-diffusion-3.5-large) (SD3.5 L)
* [Sample code](./sd35-text-to-image-gradio/app.py)
* [4-Bit Quantized Stable Diffusion 3.5 Large](https://huggingface.co/stabilityai/stable-diffusion-3.5-large) (NF4 SD3.5 L)
* NF4: [Normal Floating Point 4](https://huggingface.co/docs/diffusers/v0.32.2/en/quantization/bitsandbytes#normal-float-4-nf4)
* [Sample code](./sd35-text-to-image-quantized-gradio/app.py)
* [Stable Diffusion 3.5 Medium](https://huggingface.co/stabilityai/stable-diffusion-3.5-medium) (SD3.5 M)

### Guidance Scale Examples
|Model|[guidance_scale](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/text2img#diffusers.StableDiffusionPipeline.__call__.guidance_scale) (float 1-10)|Example|
|-----|--------------|-------|
|SD3.5 L|`guidance_scale=2.5`|![sd3.5 L guidance_scale=2.5](./images/guidance-scale-examples/sd3.5%20L%20guidance_scale=2.5.png)|
|NF4 SD3.5 L|`guidance_scale=7.5`|![nf4 sd3.5 L guidance_scale=7.5](./images/guidance-scale-examples/nf4%20sd3.5%20L%20guidance_scale=7.5.png)|
|SD3.5 M|`guidance_scale=5.0`|![sd3.5 M guidance_scale=5](./images/guidance-scale-examples/sd3.5%20M%20guidance_scale=5.png)|
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 1 addition & 1 deletion sd35-image-to-image-flask/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Stable Diffusion 3.5 Image-to-Image Python Flask App
This repo folder is for making a simple Stable Diffusion 3.5 Image-to-Image API, using Python Flask

**Estimated Inference Speed:** 23 seconds for Stable Diffusion 3.5 Large on an NVIDIA H100 GPU
**Estimated Inference Speed:** 7 seconds for Stable Diffusion 3.5 Large on an NVIDIA H100 GPU

**[Postman](https://www.postman.com/downloads/) Screenshot:**
![Postman Screenshot](./images/postman_screenshot.png)
Expand Down
22 changes: 13 additions & 9 deletions sd35-image-to-image-gradio/README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
# Stable Diffusion 3.5 Image-to-Image in Gradio
Gradio demo of [image-to-image](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/img2img) using Stable Diffusion 3.5 Medium

**Estimated Inference Speed:** 23 seconds for Stable Diffusion 3.5 Large on an NVIDIA H100 GPU
**Estimated Inference Speed:** 7 seconds for Stable Diffusion 3.5 Large on an NVIDIA H100 GPU

Full documentation is available on Hugging Face: [Stable Diffusion Image-to-image](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/img2img)

### Screen Shot
### Screenshot
![Screenshot](./images/screenshot.png)

## Quick Start
Expand Down Expand Up @@ -66,28 +66,32 @@ init_image = init_image.resize((640, 1536))
```
#### 1536x1536

![1536x1536](./images/input-image-size-examples/1536x1536.png)
![1536x1536](./images/input-image-size-examples/1536x1536.png)

#### 640x640

![640x640](./images/input-image-size-examples/640x640.png)
![640x640](./images/input-image-size-examples/640x640.png)

#### 64x64

![64x64](./images/input-image-size-examples/64x64.png)
![64x64](./images/input-image-size-examples/64x64.png)

#### 20x20

![20x20](./images/input-image-size-examples/20x20.png)
![20x20](./images/input-image-size-examples/20x20.png)

#### 1x1536

![1x1536](./images/input-image-size-examples/1x1536.png)
**NOTE:** The error is due to the [Pillow](https://pypi.org/project/pillow/) [PIL.Image.resize()](https://github.com/Stability-AI/stability-ai-toolkit/blob/main/sd35-image-to-image-gradio/app.py#L56) method not liking the resize dimensions. Developers should test if SD3.5 image-to-image can tolerate these dimensions

![1x1536](./images/input-image-size-examples/1x1536.png)

#### 5x12

![5x12](./images/input-image-size-examples/5x12.png)
**NOTE:** The error is due to the [Pillow](https://pypi.org/project/pillow/) [PIL.Image.resize()](https://github.com/Stability-AI/stability-ai-toolkit/blob/main/sd35-image-to-image-gradio/app.py#L56) method not liking the resize dimensions. Developers should test if SD3.5 image-to-image can tolerate these dimensions

![5x12](./images/input-image-size-examples/5x12.png)

#### 640x1536

![640x1536](./images/input-image-size-examples/640x1536.png)
![640x1536](./images/input-image-size-examples/640x1536.png)
6 changes: 3 additions & 3 deletions sd35-image-to-image-gradio/example_prompts.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,18 +2,18 @@ positive prompt:
Replace the soldiers with elves holding bows and arrows

positive prompt:
Replace the soldiers with elves holding crossbows, first-person-shooter screen shot, 4k
Replace the soldiers with elves holding crossbows, first-person-shooter screenshot, 4k
The elves are wearing hoods


positive prompt:
Replace the soldiers with elves holding bows and arrows, first-person-shooter screen shot, 4k
Replace the soldiers with elves holding bows and arrows, first-person-shooter screenshot, 4k
The elves are wearing hoods
There is a dragon flying in the sky


positive prompt:
Replace the soldiers with elves holding bows and arrows, video game screen shot, 4k
Replace the soldiers with elves holding bows and arrows, video game screenshot, 4k
The elves are wearing hoods. There is one dragon flying in the sky

negative prompt:
Expand Down
4 changes: 2 additions & 2 deletions sd35-inpainting-gradio/README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
# Stable Diffusion 3.5 Inpainting in Gradio
Gradio demo of inpainting using Stable Diffusion 3.5 Large

**Estimated Inference Speed:** 23 seconds for Stable Diffusion 3.5 Large on an NVIDIA H100 GPU
**Estimated Inference Speed:** 7 seconds for Stable Diffusion 3.5 Large on an NVIDIA H100 GPU

### Screen Shot
### Screenshot
![screenshot.png](./images/screenshot.png)

#### Input Image and Gradio ImageMask
Expand Down
4 changes: 2 additions & 2 deletions sd35-text-to-image-gradio/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@ Gradio demo of [text-to-image](https://huggingface.co/docs/diffusers/api/pipelin

Full documentation is available on Hugging Face: [Stable Diffusion Text-to-image](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/text2img)

**Estimated Inference Speed:** 23 seconds for Stable Diffusion 3.5 Large on an NVIDIA H100 GPU
**Estimated Inference Speed:** 7 seconds for Stable Diffusion 3.5 Large on an NVIDIA H100 GPU

### Screen Shot
### Screenshot
![Screenshot](./images/screenshot.png)

## Quick Start
Expand Down
32 changes: 13 additions & 19 deletions sd35-text-to-image-gradio/app.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@
import torch
import os

from diffusers import BitsAndBytesConfig, SD3Transformer2DModel
from diffusers import StableDiffusion3Pipeline
from huggingface_hub import login

Expand All @@ -43,6 +42,16 @@ def login_to_hugging_face(self):
login()
print("\nWARNING: To avoid the Hugging Face login prompt in the future, please set the HF_TOKEN environment variable:\n\n export HF_TOKEN=<YOUR HUGGING FACE USER ACCESS TOKEN>\n")

def _check_shader(self):
if torch.backends.mps.is_available():
device = "mps"
elif torch.cuda.is_available():
device = "cuda"
else:
device = "cpu"

return device

def _predict(self, guidance_scale, prompt, negative_prompt, progress=gr.Progress(track_tqdm=True)):
images = self._pipe(
prompt=prompt,
Expand All @@ -65,26 +74,11 @@ def _start_gradio(self):
).launch(debug=True, share=True)

def start_text_to_image(self):
model_id = "stabilityai/stable-diffusion-3.5-large"

nf4_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
model_nf4 = SD3Transformer2DModel.from_pretrained(
model_id,
subfolder="transformer",
quantization_config=nf4_config,
torch_dtype=torch.bfloat16
)

self._pipe = StableDiffusion3Pipeline.from_pretrained(
model_id,
transformer=model_nf4,
torch_dtype=torch.bfloat16
"stabilityai/stable-diffusion-3.5-large", torch_dtype=torch.bfloat16
)
self._pipe.enable_model_cpu_offload()
device = self._check_shader()
self._pipe.to(device)

self._start_gradio()
return 0
Expand Down
9 changes: 8 additions & 1 deletion sd35-text-to-image-gradio/example_prompts.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,11 @@ positive prompt:
Children's birthday party

negative prompt:
No birthday cake
No birthday cake


positive prompt:
A group of elves hunting a dragon, 4k cinema

negative prompt:
No green grass
Binary file modified sd35-text-to-image-gradio/images/screenshot.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
106 changes: 106 additions & 0 deletions sd35-text-to-image-quantized-gradio/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
# 4-Bit Quantized Stable Diffusion 3.5 Text-to-Image in Gradio
Gradio demo of [text-to-image](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/text2img) using 4-bit quantized Stable Diffusion 3.5 Large

Full documentation is available on Hugging Face: [Stable Diffusion Text-to-image](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/text2img)

**Estimated Inference Speed:** 18 seconds for quantized Stable Diffusion 3.5 Large on an NVIDIA H100 GPU

### Screenshot
![Screenshot](./images/screenshot.png)

## Quick Start
1. Open a web browser, log in to Hugging Face and register your name and email,
to use [stable-diffusion-3.5-large](https://huggingface.co/stabilityai/stable-diffusion-3.5-large)
2. Create a new Hugging Face [user access token](https://huggingface.co/docs/hub/en/security-tokens),
which will capture that you completed the registration form
3. Clone this repo to your machine and change into the directory for this demo:
```
cd ./stability-ai-toolkit/sd35-text-to-image-gradio
```
4. Set up the app in a Python virtual environment:

```
python -m venv <your_environment_name>
source <your_environment_name>/bin/activate
```
5. Set your `HF_TOKEN` inside your virtual environment
```
export HF_TOKEN=<Hugging Face user access token>
```
6. Install dependencies
```
pip install -r requirements.txt
```

NOTE: Read [requirements.txt](./requirements.txt) for
[MacOS PyTorch installation instructions](https://developer.apple.com/metal/pytorch/)

TL;DR:
```
# Inside your virtual environment
pip install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cpu
```
7. Start the app
```
python app.py
```
8. Open UI in a web browser: [http://127.0.0.1:7861](http://127.0.0.1:7861)

## How to Quantize Stable Diffusion 3.5 Large
### [With Quantization](./app.py)
```
import torch
from diffusers import BitsAndBytesConfig, SD3Transformer2DModel
from diffusers import StableDiffusion3Pipeline
...
model_id = "stabilityai/stable-diffusion-3.5-large"
nf4_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)
model_nf4 = SD3Transformer2DModel.from_pretrained(
    model_id,
    subfolder="transformer",
    quantization_config=nf4_config,
    torch_dtype=torch.bfloat16
)
pipe = StableDiffusion3Pipeline.from_pretrained(
    model_id, 
    transformer=model_nf4,
    torch_dtype=torch.bfloat16
)
pipe.enable_model_cpu_offload()
```
### [Without Quantization](/sd35-text-to-image-gradio/app.py)
```
import torch
from diffusers import StableDiffusion3Pipeline
...
model_id = "stabilityai/stable-diffusion-3.5-large"
pipe = StableDiffusion3Pipeline.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16
)
```

## Why Use Quantized Stable Diffusion 3.5 Large

**NOTE:** There is a **SIGNIFICANT IMPROVEMENT** in **NEGATIVE PROMPTING** accuracy, when using 4-bit quantized Stable Diffusion 3.5 Large

Many use cases for [Stable Diffusion 3.5 Large](https://huggingface.co/stabilityai/stable-diffusion-3.5-large) (SD3.5 L) require the algorithms of the model, without the large memory footprint:
* 4-bit quantization of SD3.5 L allows it to load onto GPUs with limited VRAM
* 4-bit quantization makes it easier to offload certain parts of model execution to the CPU, further reducing GPU memory usage
* There is often an acceptable decrease in generate image quality, with the benefit of a reduced cost due to reduced VRAM
* Users working on their own computer with a retail GPU (or Apple Silicon with an integrated GPU) would benefit from this use case
* [Stable Diffusion 3.5 Medium](https://huggingface.co/stabilityai/stable-diffusion-3.5-medium) (SD3.5 M) could alternatively be used as it has fewer parameters than Large and an inference speed that's even faster than quantized SD3.5 L

### Stable Diffusion 3.5 Inference Speeds
|Model|Inference Speed (seconds)|GPU|
|-----|-------------------------|---|
|SD3.5 M|4 s|NVIDIA H100 GPU with 80 GB of VRAM|
|[4-Bit Quanitized SD3.5 L](/sd35-text-to-image-quantized-gradio/)|18 s|NVIDIA H100 GPU with 80 GB of VRAM|
|SD3.5 L|7 s|NVIDIA H100 GPU with 80 GB of VRAM|
Loading

0 comments on commit 21bbb57

Please sign in to comment.