Skip to content

[WACV 2025] MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning

Notifications You must be signed in to change notification settings

haoningwu3639/MegaFusion

Repository files navigation

MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning [WACV 2025]

This repository contains the official PyTorch implementation of MegaFusion: https://arxiv.org/abs/2408.11001/

We are in the process of standardizing and gradually open-sourcing our code in the near future, so please stay tuned.

Some Information

Project Page $\cdot$ Paper

News

  • [2024.10.29] MegaFusion has been accepted to WACV 2025.
  • [2024.9.10] A new version of the paper has been updated. Please check out our latest version paper for further technical details, evaluations, and visualizations.
  • [2024.8.20] Our pre-print paper is released on arXiv, we are working on releasing our code and will open-source it shortly.

Requirements

A suitable conda environment named megafusion can be created and activated with:

conda env create -f environment.yaml
conda activate megafusion

Inference

Since our MegaFusion is designed to extend existing diffusion-based text-to-image models towards higher-resolution generation. We provide the offical MegaFusion implementations on several representative models, including StableDiffusion, StableDiffusion-XL, DeepFloyd, ControlNet, and IP-Adapter.

Inference with SDM-MegaFusion

First, please download pre-trained StableDiffusion-1.5 from SDM-1.5. Then, all the pre-trained checkpoints should be placed into the corresponding location in the folder ./SDM-MegaFusion/ckpt/stable-diffusion-v1-5/.

Run the inference demo with:

CUDA_VISIBLE_DEVICES=0 accelerate launch inference.py

Inference with SDXL-MegaFusion

Taking computational overhead into consideration, we only use SDXL-base, and discard SDXL-refiner in our project. First, please download pre-trained StableDiffusion-XL from SDXL-base. Then, all the pre-trained checkpoints should be placed into the corresponding location in the folder ./SDXL-MegaFusion/ckpt/.

To be updated soon...

Inference with Floyd-MegaFusion

Taking computational overhead into consideration, we only use the first two stages of DeepFloyd, and discard the last stage in our project. First, please download pre-trained DeepFloyd from SDM. Then, all the pre-trained checkpoints should be placed into the corresponding location in the folder ./DeepFloyd/ckpt/.

To be updated soon...

Inference with ControlNet-MegaFusion

To be updated soon...

Inference with IP-Adapter-MegaFusion

To be updated soon...

Evaluation

Dataset

Our main experiments are conducted on the commonly used MS-COCO dataset, you can download it from MS-COCO.

Sample images

Taking SDM-MegaFusion as an example, you can load the captions, and sample images with them as conditions via:

CUDA_VISIBLE_DEVICES=0 accelerate launch synthesize.py

Metrics

We use the commonly used FID and KID as main evalutaion metrics. Besides, to quantitatively evaluate the semantic correctness of the synthesized results, we also utilize several language-based scores in our work, including CLIP-T CIDEr, Meteor, and ROUGE.

To be updated soon...

Caption synthesized images with VLM

In our project, we use a state-of-the-art open-sourced VLM, MiniGPT-v2 to give a caption for each synthesized image, to further evaluate the semantic correctness of higher-resolution generation.

To be updated soon...

TODO

  • Release Paper
  • Complete Bibtex
  • Model Code of SDM-MegaFusion
  • Inference Code of SDM-MegaFusion
  • Image Caption Code of MiniGPT-v2
  • Evaluation Code
  • Model Code of SDXL-MegaFusion
  • Inference Code of SDXL-MegaFusion
  • Model Code of Floyd-MegaFusion
  • Inference Code of Floyd-MegaFusion

Citation

If you use this code for your research or project, please cite:

@InProceedings{wu2024megafusion,
        author    = {Wu, Haoning and Shen, Shaocheng and Hu, Qiang and Zhang, Xiaoyun and Zhang, Ya and Wang, Yanfeng},
        title     = {MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning},
        booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
        year      = {2025},
  }

Acknowledgements

Many thanks to the code bases from diffusers, SimpleSDM, SimpleSDXL, and DeepFloyd.

Contact

If you have any questions, please feel free to contact haoningwu3639@gmail.com or shenshaocheng@sjtu.edu.cn.

About

[WACV 2025] MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published