MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning [WACV 2025]

This repository contains the official PyTorch implementation of MegaFusion: https://arxiv.org/abs/2408.11001/

We are in the process of standardizing and gradually open-sourcing our code in the near future, so please stay tuned.

Some Information

Project Page $\cdot$ Paper

News

[2024.10.29] MegaFusion has been accepted to WACV 2025.
[2024.9.10] A new version of the paper has been updated. Please check out our latest version paper for further technical details, evaluations, and visualizations.
[2024.8.20] Our pre-print paper is released on arXiv, we are working on releasing our code and will open-source it shortly.

Requirements

Python >= 3.8 (Recommend to use Anaconda or Miniconda)
PyTorch >= 1.12
xformers == 0.0.13
diffusers == 0.13.1
accelerate == 0.17.1
transformers == 4.27.4

A suitable conda environment named megafusion can be created and activated with:

conda env create -f environment.yaml
conda activate megafusion

Inference

Since our MegaFusion is designed to extend existing diffusion-based text-to-image models towards higher-resolution generation. We provide the offical MegaFusion implementations on several representative models, including StableDiffusion, StableDiffusion-XL, DeepFloyd, ControlNet, and IP-Adapter.

Inference with SDM-MegaFusion

First, please download pre-trained StableDiffusion-1.5 from SDM-1.5. Then, all the pre-trained checkpoints should be placed into the corresponding location in the folder ./SDM-MegaFusion/ckpt/stable-diffusion-v1-5/.

Run the inference demo with:

CUDA_VISIBLE_DEVICES=0 accelerate launch inference.py

Inference with SDXL-MegaFusion

Taking computational overhead into consideration, we only use SDXL-base, and discard SDXL-refiner in our project. First, please download pre-trained StableDiffusion-XL from SDXL-base. Then, all the pre-trained checkpoints should be placed into the corresponding location in the folder ./SDXL-MegaFusion/ckpt/.

To be updated soon...

Inference with Floyd-MegaFusion

Taking computational overhead into consideration, we only use the first two stages of DeepFloyd, and discard the last stage in our project. First, please download pre-trained DeepFloyd from SDM. Then, all the pre-trained checkpoints should be placed into the corresponding location in the folder ./DeepFloyd/ckpt/.

To be updated soon...

Inference with ControlNet-MegaFusion

To be updated soon...

Inference with IP-Adapter-MegaFusion

To be updated soon...

Evaluation

Dataset

Our main experiments are conducted on the commonly used MS-COCO dataset, you can download it from MS-COCO.

Sample images

Taking SDM-MegaFusion as an example, you can load the captions, and sample images with them as conditions via:

CUDA_VISIBLE_DEVICES=0 accelerate launch synthesize.py

Metrics

We use the commonly used FID and KID as main evalutaion metrics. Besides, to quantitatively evaluate the semantic correctness of the synthesized results, we also utilize several language-based scores in our work, including CLIP-T CIDEr, Meteor, and ROUGE.

To be updated soon...

Caption synthesized images with VLM

In our project, we use a state-of-the-art open-sourced VLM, MiniGPT-v2 to give a caption for each synthesized image, to further evaluate the semantic correctness of higher-resolution generation.

To be updated soon...

TODO

Citation

If you use this code for your research or project, please cite:

@InProceedings{wu2024megafusion,
        author    = {Wu, Haoning and Shen, Shaocheng and Hu, Qiang and Zhang, Xiaoyun and Zhang, Ya and Wang, Yanfeng},
        title     = {MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning},
        booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
        year      = {2025},
  }

Acknowledgements

Many thanks to the code bases from diffusers, SimpleSDM, SimpleSDXL, and DeepFloyd.

Contact

If you have any questions, please feel free to contact haoningwu3639@gmail.com or shenshaocheng@sjtu.edu.cn.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning [WACV 2025]

Some Information

News

Requirements

Inference

Inference with SDM-MegaFusion

Inference with SDXL-MegaFusion

Inference with Floyd-MegaFusion

Inference with ControlNet-MegaFusion

Inference with IP-Adapter-MegaFusion

Evaluation

Dataset

Sample images

Metrics

Caption synthesized images with VLM

TODO

Citation

Acknowledgements

Contact

About

Releases

Packages

Contributors 2

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
ControlNet-MegaFusion		ControlNet-MegaFusion
Floyd-MegaFusion		Floyd-MegaFusion
IP-Adapter-MegaFusion		IP-Adapter-MegaFusion
SDM-MegaFusion		SDM-MegaFusion
SDXL-MegaFusion		SDXL-MegaFusion
README.md		README.md
environment.yaml		environment.yaml
teaser.png		teaser.png

haoningwu3639/MegaFusion

Folders and files

Latest commit

History

Repository files navigation

MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning [WACV 2025]

Some Information

News

Requirements

Inference

Inference with SDM-MegaFusion

Inference with SDXL-MegaFusion

Inference with Floyd-MegaFusion

Inference with ControlNet-MegaFusion

Inference with IP-Adapter-MegaFusion

Evaluation

Dataset

Sample images

Metrics

Caption synthesized images with VLM

TODO

Citation

Acknowledgements

Contact

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages