Latent Diffusion Model (LDM) for Layout-to-image generation

This is the non-official repository of LDM for layout-to-image generation. Currently, the config and code in official LDM repo is incompleted. Thus, the repo aims to reproduce LDM on Layout-to-image generation task. If you find it useful, please cite their original paper LDM.

Machine environment

Ubuntu version: 18.04.5 LTS
CUDA version: 11.6
Testing GPU: Nvidia Tesla V100

Requirements

A conda environment named ldm_layout can be created and activated with:

conda env create -f environment.yaml
conda activate ldm_layout

Datasets setup

We provide two approaches to set up the datasets:

Auto-download

To automatically download datasets and save it into the default path (../), please use following script:

bash tools/download_datasets.sh

Manual setup

Text-to-image generation

We use COCO 2014 splits for text-to-image task, which can be downloaded from official COCO website.

Please create a folder name 2014 and collect the downloaded data and annotations as follows.

COCO 2014 file structure

>2014
├── annotations
│   └── captions_val2014.json
│   └── ...
└── val2014
   └── COCO_val2014_000000000073.jpg
   └── ...

Layout-to-image generation

We use COCO 2017 splits to test Frido on layout-to-image task, which can be downloaded from official COCO website.

Please create a folder name 2017 and collect the downloaded data and annotations as follows.

COCO 2017 file structure

>2017
├── annotations
│   └── captions_val2017.json
│   └── ...
└── val2017
   └── 000000000872.jpg
   └── ...

File structure for dataset and code

Please make sure that the file structure is the same as the following. Or, you might modify the config file to match the corresponding paths.

File structure

>datasets
├── coco
│   └── 2014
│        └── annotations
│        └── val2014
│        └── ...
│   └── 2017
│        └── annotations
│        └── val2017
│        └── ...
>ldm_layout
└── configs
│   └── ldm
│   └── ... 
└── exp
│   └── ...
└── ldm
└── taming
└── scripts
└── tools
└── ...

VQGAN models setup

We provide script to download VQGAN-f8 in LDM github:

To automatically download VQGAN-f8 and save it into the default path (exp/), please use following script:

bash tools/download_models.sh

Train LDM for layout-to-image generation

We now provide scripts for training LDM on text-to-image and layout-to-image.

Once the datasets are properly set up, one may train LDM by the following commands.

Text-to-image

bash tools/ldm/train_ldm_coco_T2I.sh

Default output folder will be exp/ldm/T2I

Layout-to-image

bash tools/ldm/train_ldm_coco_Layout2I.sh

Default output folder will be exp/ldm/Layout2I

Multi-GPU testing

Change "--gpus" to identify the number of GPUs for training.

For example, using 4 gpus

python main.py --base configs/ldm/coco_sg2im_ldm_Layout2I_vqgan_f8.yaml \
        -t True --gpus 0,1,2,3 -log_dir ./exp/ldm/Layout2I \
        -n coco_sg2im_ldm_Layout2I_vqgan_f8 --scale_lr False -tb True

Inference

Change "-t" to identify training or testing phase. (Note that multi-gpu testing is supported.)

For example, using 4 gpus for testing

python main.py --base configs/ldm/coco_sg2im_ldm_Layout2I_vqgan_f8.yaml \
        -t False --gpus 0,1,2,3 -log_dir ./exp/ldm/Layout2I \
        -n coco_sg2im_ldm_Layout2I_vqgan_f8 --scale_lr False -tb True

Acknowledgement

We build LDM_layout codebase heavily on the codebase of Latent Diffusion Model (LDM) and VQGAN. We sincerely thank the authors for open-sourcing!

Citation

If you find this code useful for your research, please consider citing:

@misc{rombach2021highresolution,
      title={High-Resolution Image Synthesis with Latent Diffusion Models}, 
      author={Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and Björn Ommer},
      year={2021},
      eprint={2112.10752},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

@misc{https://doi.org/10.48550/arxiv.2204.11824,
  doi = {10.48550/ARXIV.2204.11824},
  url = {https://arxiv.org/abs/2204.11824},
  author = {Blattmann, Andreas and Rombach, Robin and Oktay, Kaan and Ommer, Björn},
  keywords = {Computer Vision and Pattern Recognition (cs.CV), FOS: Computer and information sciences, FOS: Computer and information sciences},
  title = {Retrieval-Augmented Diffusion Models},
  publisher = {arXiv},
  year = {2022},  
  copyright = {arXiv.org perpetual, non-exclusive license}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
configs		configs
ldm		ldm
taming		taming
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml
main.py		main.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Latent Diffusion Model (LDM) for Layout-to-image generation

Machine environment

Requirements

Datasets setup

Auto-download

Manual setup

Text-to-image generation

Layout-to-image generation

File structure for dataset and code

VQGAN models setup

Train LDM for layout-to-image generation

Text-to-image

Layout-to-image

Multi-GPU testing

Inference

Acknowledgement

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

davidhalladay/ldm_layout

Folders and files

Latest commit

History

Repository files navigation

Latent Diffusion Model (LDM) for Layout-to-image generation

Machine environment

Requirements

Datasets setup

Auto-download

Manual setup

Text-to-image generation

Layout-to-image generation

File structure for dataset and code

VQGAN models setup

Train LDM for layout-to-image generation

Text-to-image

Layout-to-image

Multi-GPU testing

Inference

Acknowledgement

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages