⚡️Sana: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer

SANA FOR WINDOWS

MANUAL INSTALLATION (Using CudaToolkit=12.6 (change the instructions for your version) )

Git clone https://github.com/gjnave/Sana-for-Windows
cd Sana-for-Windows
conda create -n sana python=3.10.0 -y
conda activate sana
conda install nvidia/label/cuda-12.6.0::cuda-toolkit
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu126
python.exe -m pip install -U pip
pip install -U xformers --index-url https://download.pytorch.org/whl/cu126
pip install https://github.com/woct0rdho/triton-windows/releases/download/v3.2.0-windows.post10/triton-3.2.0-cp310-cp310-win_amd64.whl
pip install portalocker
pip install -e .
pip install huggingface-hub
pip install huggingface-hub[cli]
pip install gradio
doule click 'login-to-sana.bat'
copy token from huggingface and right click to paste
Press Enter and Confirm
double click run-sana-windows.bat (download of models will begin)
Choose the model you want to use (1600 or 600)

** A free system checker can be downloaded at: https://www.patreon.com/posts/automated-system-117200313 ** A GetGoingFast Quick Installer can be obtained here: https://www.patreon.com/posts/nvidia-sana-for-122669029

Thanks to /uRemarkable-Special86 (YT: foreropa) for input with the start bat

⚡️Sana: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer

ICLR 2025 Oral Presentation

💡 Introduction

We introduce Sana, a text-to-image framework that can efficiently generate images up to 4096 × 4096 resolution. Sana can synthesize high-resolution, high-quality images with strong text-image alignment at a remarkably fast speed, deployable on laptop GPU. Core designs include:

(1) DC-AE: unlike traditional AEs, which compress images only 8×, we trained an AE that can compress images 32×, effectively reducing the number of latent tokens.
(2) Linear DiT: we replace all vanilla attention in DiT with linear attention, which is more efficient at high resolutions without sacrificing quality.
(3) Decoder-only text encoder: we replaced T5 with a modern decoder-only small LLM as the text encoder and designed complex human instruction with in-context learning to enhance the image-text alignment.
(4) Efficient training and sampling: we propose Flow-DPM-Solver to reduce sampling steps, with efficient caption labeling and selection to accelerate convergence.

As a result, Sana-0.6B is very competitive with modern giant diffusion models (e.g. Flux-12B), being 20 times smaller and 100+ times faster in measured throughput. Moreover, Sana-0.6B can be deployed on a 16GB laptop GPU, taking less than 1 second to generate a 1024 × 1024 resolution image. Sana enables content creation at low cost.

🔥🔥 News

(🔥 New) [2025/2/10] 🚀Sana + ControlNet is released. [Guidance] | [Model] | [Demo]
(🔥 New) [2025/1/30] Release CAME-8bit optimizer code. Saving more GPU memory during training. [How to config]
(🔥 New) [2025/1/29] 🎉 🎉 🎉SANA 1.5 is out! Figure out how to do efficient training & inference scaling! 🚀[Tech Report]
(🔥 New) [2025/1/24] 4bit-Sana is released, powered by SVDQuant and Nunchaku inference engine. Now run your Sana within 8GB GPU VRAM [Guidance] [Demo] [Model]
(🔥 New) [2025/1/24] DCAE-1.1 is released, better reconstruction quality. [Model] [diffusers]
(🔥 New) [2025/1/23] Sana is accepted as Oral by ICLR-2025. 🎉🎉🎉

(🔥 New) [2025/1/12] DC-AE tiling makes Sana-4K inferences 4096x4096px images within 22GB GPU memory. With model offload and 8bit/4bit quantize. The 4K Sana run within 8GB GPU VRAM. [Guidance]
(🔥 New) [2025/1/11] Sana code-base license changed to Apache 2.0.
(🔥 New) [2025/1/10] Inference Sana with 8bit quantization.[Guidance]
(🔥 New) [2025/1/8] 4K resolution Sana models is supported in Sana-ComfyUI and work flow is also prepared. [4K guidance]
(🔥 New) [2025/1/8] 1.6B 4K resolution Sana models are released: [BF16 pth] or [BF16 diffusers]. 🚀 Get your 4096x4096 resolution images within 20 seconds! Find more samples in Sana page. Thanks SUPIR for their wonderful work and support.
(🔥 New) [2025/1/2] Bug in the diffusers pipeline is solved. Solved PR
(🔥 New) [2025/1/2] 2K resolution Sana models is supported in Sana-ComfyUI and work flow is also prepared.
✅ [2024/12] 1.6B 2K resolution Sana models are released: [BF16 pth] or [BF16 diffusers]. 🚀 Get your 2K resolution images within 4 seconds! Find more samples in Sana page. Thanks SUPIR for their wonderful work and support.
✅ [2024/12] diffusers supports Sana-LoRA fine-tuning! Sana-LoRA's training and convergence speed is super fast. [Guidance] or [diffusers docs].
✅ [2024/12] diffusers has Sana! All Sana models in diffusers safetensors are released and diffusers pipeline SanaPipeline, SanaPAGPipeline, DPMSolverMultistepScheduler(with FlowMatching) are all supported now. We prepare a Model Card for you to choose.
✅ [2024/12] 1.6B BF16 Sana model is released for stable fine-tuning.
✅ [2024/12] We release the ComfyUI node for Sana. [Guidance]
✅ [2024/11] All multi-linguistic (Emoji & Chinese & English) SFT models are released: 1.6B-512px, 1.6B-1024px, 600M-512px, 600M-1024px. The metric performance is shown here
✅ [2024/11] Sana Replicate API is launching at Sana-API.
✅ [2024/11] 1.6B Sana models are released.
✅ [2024/11] Training & Inference & Metrics code are released.
✅ [2024/11] Working on diffusers.
[2024/10] Demo is released.
[2024/10] DC-AE Code and weights are released!
[2024/10] Paper is on Arxiv!

Performance

Methods (1024x1024)	Throughput (samples/s)	Latency (s)	Params (B)	Speedup	FID 👇	CLIP 👆	GenEval 👆	DPG 👆
FLUX-dev	0.04	23.0	12.0	1.0×	10.15	27.47	0.67	84.0
Sana-0.6B	1.7	0.9	0.6	39.5×	5.81	28.36	0.64	83.6
Sana-0.6B-MultiLing	1.7	0.9	0.6	39.5×	5.61	28.80	0.68	84.2
Sana-1.6B	1.0	1.2	1.6	23.3×	5.76	28.67	0.66	84.8
Sana-1.6B-MultiLing	1.0	1.2	1.6	23.3×	5.92	28.94	0.69	84.5

Click to show all

Methods	Throughput (samples/s)	Latency (s)	Params (B)	Speedup	FID 👆	CLIP 👆	GenEval 👆	DPG 👆
512 × 512 resolution
PixArt-α	1.5	1.2	0.6	1.0×	6.14	27.55	0.48	71.6
PixArt-Σ	1.5	1.2	0.6	1.0×	6.34	27.62	0.52	79.5
Sana-0.6B	6.7	0.8	0.6	5.0×	5.67	27.92	0.64	84.3
Sana-1.6B	3.8	0.6	1.6	2.5×	5.16	28.19	0.66	85.5
1024 × 1024 resolution
LUMINA-Next	0.12	9.1	2.0	2.8×	7.58	26.84	0.46	74.6
SDXL	0.15	6.5	2.6	3.5×	6.63	29.03	0.55	74.7
PlayGroundv2.5	0.21	5.3	2.6	4.9×	6.09	29.13	0.56	75.5
Hunyuan-DiT	0.05	18.2	1.5	1.2×	6.54	28.19	0.63	78.9
PixArt-Σ	0.4	2.7	0.6	9.3×	6.15	28.26	0.54	80.5
DALLE3	-	-	-	-	-	-	0.67	83.5
SD3-medium	0.28	4.4	2.0	6.5×	11.92	27.83	0.62	84.1
FLUX-dev	0.04	23.0	12.0	1.0×	10.15	27.47	0.67	84.0
FLUX-schnell	0.5	2.1	12.0	11.6×	7.94	28.14	0.71	84.8
Sana-0.6B	1.7	0.9	0.6	39.5×	5.81	28.36	0.64	83.6
Sana-1.6B	1.0	1.2	1.6	23.3×	5.76	28.67	0.66	84.8

🔧 1. Dependencies and Installation

(This section has been removed as the altered files above will no longer work with Linux)

💻 2. How to Play with Sana (Inference)

💰Hardware requirement

9GB VRAM is required for 0.6B model and 12GB VRAM for 1.6B model. Our later quantization version will require less than 8GB for inference.
All the tests are done on A100 GPUs. Different GPU version may be different.

🔛 Choose your model: Model card

🔛 Quick start with Gradio

# official online demo
DEMO_PORT=15432 \
python app/app_sana.py \
    --share \
    --config=configs/sana_config/1024ms/Sana_1600M_img1024.yaml \
    --model_path=hf://Efficient-Large-Model/Sana_1600M_1024px/checkpoints/Sana_1600M_1024px.pth \
    --image_size=1024

1. How to use `SanaPipeline` with `🧨diffusers`

Important

Upgrade your diffusers>=0.32.0.dev to make the SanaPipeline and SanaPAGPipeline available!

pip install git+https://github.com/huggingface/diffusers

Make sure to specify pipe.transformer to default torch_dtype and variant according to Model Card.

Set pipe.text_encoder to BF16 and pipe.vae to FP32 or BF16. For more info, docs are here.

# run `pip install git+https://github.com/huggingface/diffusers` before use Sana in diffusers
import torch
from diffusers import SanaPipeline

pipe = SanaPipeline.from_pretrained(
    "Efficient-Large-Model/Sana_1600M_1024px_BF16_diffusers",
    variant="bf16",
    torch_dtype=torch.bfloat16,
)
pipe.to("cuda")

pipe.vae.to(torch.bfloat16)
pipe.text_encoder.to(torch.bfloat16)

prompt = 'a cyberpunk cat with a neon sign that says "Sana"'
image = pipe(
    prompt=prompt,
    height=1024,
    width=1024,
    guidance_scale=4.5,
    num_inference_steps=20,
    generator=torch.Generator(device="cuda").manual_seed(42),
)[0]

image[0].save("sana.png")

2. How to use `SanaPAGPipeline` with `🧨diffusers`

# run `pip install git+https://github.com/huggingface/diffusers` before use Sana in diffusers
import torch
from diffusers import SanaPAGPipeline

pipe = SanaPAGPipeline.from_pretrained(
  "Efficient-Large-Model/Sana_1600M_1024px_diffusers",
  variant="fp16",
  torch_dtype=torch.float16,
  pag_applied_layers="transformer_blocks.8",
)
pipe.to("cuda")

pipe.text_encoder.to(torch.bfloat16)
pipe.vae.to(torch.bfloat16)

prompt = 'a cyberpunk cat with a neon sign that says "Sana"'
image = pipe(
    prompt=prompt,
    guidance_scale=5.0,
    pag_scale=2.0,
    num_inference_steps=20,
    generator=torch.Generator(device="cuda").manual_seed(42),
)[0]
image[0].save('sana.png')

3. How to use Sana in this repo

import torch
from app.sana_pipeline import SanaPipeline
from torchvision.utils import save_image

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
generator = torch.Generator(device=device).manual_seed(42)

sana = SanaPipeline("configs/sana_config/1024ms/Sana_1600M_img1024.yaml")
sana.from_pretrained("hf://Efficient-Large-Model/Sana_1600M_1024px_BF16/checkpoints/Sana_1600M_1024px_BF16.pth")
prompt = 'a cyberpunk cat with a neon sign that says "Sana"'

image = sana(
    prompt=prompt,
    height=1024,
    width=1024,
    guidance_scale=5.0,
    pag_guidance_scale=2.0,
    num_inference_steps=18,
    generator=generator,
)
save_image(image, 'output/sana.png', nrow=1, normalize=True, value_range=(-1, 1))

4. Run Sana (Inference) with Docker

# Pull related models
huggingface-cli download google/gemma-2b-it
huggingface-cli download google/shieldgemma-2b
huggingface-cli download mit-han-lab/dc-ae-f32c32-sana-1.0
huggingface-cli download Efficient-Large-Model/Sana_1600M_1024px

# Run with docker
docker build . -t sana
docker run --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 \
    -v ~/.cache:/root/.cache \
    sana

🔛 Run inference with TXT or JSON files

# Run samples in a txt file
python scripts/inference.py \
      --config=configs/sana_config/1024ms/Sana_1600M_img1024.yaml \
      --model_path=hf://Efficient-Large-Model/Sana_1600M_1024px/checkpoints/Sana_1600M_1024px.pth \
      --txt_file=asset/samples/samples_mini.txt

# Run samples in a json file
python scripts/inference.py \
      --config=configs/sana_config/1024ms/Sana_1600M_img1024.yaml \
      --model_path=hf://Efficient-Large-Model/Sana_1600M_1024px/checkpoints/Sana_1600M_1024px.pth \
      --json_file=asset/samples/samples_mini.json

where each line of asset/samples/samples_mini.txt contains a prompt to generate

🔥 3. How to Train Sana

💰Hardware requirement

32GB VRAM is required for both 0.6B and 1.6B model's training

1). Train with image-text pairs in directory

We provide a training example here and you can also select your desired config file from config files dir based on your data structure.

To launch Sana training, you will first need to prepare data in the following formats. Here is an example for the data structure for reference.

asset/example_data
├── AAA.txt
├── AAA.png
├── BCC.txt
├── BCC.png
├── ......
├── CCC.txt
└── CCC.png

Then Sana's training can be launched via

# Example of training Sana 0.6B with 512x512 resolution from scratch
bash train_scripts/train.sh \
  configs/sana_config/512ms/Sana_600M_img512.yaml \
  --data.data_dir="[asset/example_data]" \
  --data.type=SanaImgDataset \
  --model.multi_scale=false \
  --train.train_batch_size=32

# Example of fine-tuning Sana 1.6B with 1024x1024 resolution
bash train_scripts/train.sh \
  configs/sana_config/1024ms/Sana_1600M_img1024.yaml \
  --data.data_dir="[asset/example_data]" \
  --data.type=SanaImgDataset \
  --model.load_from=hf://Efficient-Large-Model/Sana_1600M_1024px/checkpoints/Sana_1600M_1024px.pth \
  --model.multi_scale=false \
  --train.train_batch_size=8

2). Train with image-text pairs in directory

We also provide conversion scripts to convert your data to the required format. You can refer to the data conversion scripts for more details.

python tools/convert_ImgDataset_to_WebDatasetMS_format.py

Then Sana's training can be launched via

# Example of training Sana 0.6B with 512x512 resolution from scratch
bash train_scripts/train.sh \
  configs/sana_config/512ms/Sana_600M_img512.yaml \
  --data.data_dir="[asset/example_data_tar]" \
  --data.type=SanaWebDatasetMS \
  --model.multi_scale=true \
  --train.train_batch_size=32

💻 4. Metric toolkit

Refer to Toolkit Manual.

💪To-Do List

We will try our best to release

[✅] Training code
[✅] Inference code
[✅] Model zoo
[✅] ComfyUI
[✅] DC-AE Diffusers
[✅] Sana merged in Diffusers(huggingface/diffusers#9982)
[✅] LoRA training by @paul(diffusers: huggingface/diffusers#10234)
[✅] 2K/4K resolution models.(Thanks @SUPIR to provide a 4K super-resolution model)
[✅] 8bit / 4bit Laptop development
[💻] ControlNet (train & inference & models)
[💻] Larger model size
[💻] Better re-construction F32/F64 VAEs.
[💻] Sana1.5 (Focus on: Human body / Human face / Text rendering / Realism / Efficiency)

🤗Acknowledgements

Thanks to the following open-sourced codebase for their wonderful work and codebase!

🌟 Star History

📖BibTeX

@misc{xie2024sana,
      title={Sana: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer},
      author={Enze Xie and Junsong Chen and Junyu Chen and Han Cai and Haotian Tang and Yujun Lin and Zhekai Zhang and Muyang Li and Ligeng Zhu and Yao Lu and Song Han},
      year={2024},
      eprint={2410.10629},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2410.10629},
    }

Name		Name	Last commit message	Last commit date
Latest commit History 111 Commits
.github/workflows		.github/workflows
CIs		CIs
app		app
asset		asset
configs		configs
diffusion		diffusion
sana		sana
scripts		scripts
tests/bash		tests/bash
tools		tools
train_scripts		train_scripts
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CITATION.bib		CITATION.bib
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
about.nfo		about.nfo
login-to-sana.bat		login-to-sana.bat
pyproject.toml		pyproject.toml
run-sana-windows.bat		run-sana-windows.bat

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MANUAL INSTALLATION (Using CudaToolkit=12.6 (change the instructions for your version) )

⚡️Sana: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer

ICLR 2025 Oral Presentation

💡 Introduction

🔥🔥 News

Performance

Click to show all

Contents

🔧 1. Dependencies and Installation

💻 2. How to Play with Sana (Inference)

💰Hardware requirement

🔛 Choose your model: Model card

🔛 Quick start with Gradio

1. How to use `SanaPipeline` with `🧨diffusers`

2. How to use `SanaPAGPipeline` with `🧨diffusers`

3. How to use Sana in this repo

4. Run Sana (Inference) with Docker

🔛 Run inference with TXT or JSON files

🔥 3. How to Train Sana

💰Hardware requirement

1). Train with image-text pairs in directory

2). Train with image-text pairs in directory

💻 4. Metric toolkit

💪To-Do List

🤗Acknowledgements

🌟 Star History

📖BibTeX

About

Releases

Packages

Languages

License

gjnave/Sana-for-Windows

Folders and files

Latest commit

History

Repository files navigation

MANUAL INSTALLATION (Using CudaToolkit=12.6 (change the instructions for your version) )

⚡️Sana: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer

ICLR 2025 Oral Presentation

💡 Introduction

🔥🔥 News

Performance

Click to show all

Contents

🔧 1. Dependencies and Installation

💻 2. How to Play with Sana (Inference)

💰Hardware requirement

🔛 Choose your model: Model card

🔛 Quick start with Gradio

1. How to use SanaPipeline with 🧨diffusers

2. How to use SanaPAGPipeline with 🧨diffusers

3. How to use Sana in this repo

4. Run Sana (Inference) with Docker

🔛 Run inference with TXT or JSON files

🔥 3. How to Train Sana

💰Hardware requirement

1). Train with image-text pairs in directory

2). Train with image-text pairs in directory

💻 4. Metric toolkit

💪To-Do List

🤗Acknowledgements

🌟 Star History

📖BibTeX

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

1. How to use `SanaPipeline` with `🧨diffusers`

2. How to use `SanaPAGPipeline` with `🧨diffusers`

Packages