Skip to content

Latest commit

 

History

History
113 lines (76 loc) · 4.55 KB

README.md

File metadata and controls

113 lines (76 loc) · 4.55 KB

If you like our project, please give us a star ⭐ on GitHub for the latest update.

arxiv Hits GitHub User's stars

Shaofeng Zhang1, Jinfa Huang2, Qiang Zhou3, Zhibin Wang3, Fan Wang4, Jiebo Luo2, Junchi Yan1,*

1Shanghai Jiao Tong University, 2University of Rochester, 3INF Tech Co., Ltd., 4Alibaba Group

💡 Highlight

Our PQDiff can outpaint images with arbitrary and continuous multiples in one step by learning the positional relationships and pixel information at the same time.

🚀 Quick Start

Model Zoo

Checkpoint Google Cloud Baidu Yun
Scenery Download TBD
Building Facades TBD TBD
WikiArt TBD TBD

Dataset preparing

We use Flickr, Buildings, and WikiArt datasets, which can be obtained at link.

Download the autoencoder

We use the autoencoder transformed from the stable diffusion, and you can download it from link.

Training stage

accelerate launch --multi_gpu --num_processes 8 --mixed_precision fp16 train_ldm.py --config=configs/flickr192_large.py

You can train on your own dataset by modifying dataset/dataset.py

Sampling stage

We provide the 2.25x, 5x, and 11.7x outpainting settings (with copy operation). Run:

python3 -m torch.distributed.launch --nproc_per_node=8 \
        --node_rank 0 \
        --master_addr=${MASTER_ADDR:-127.0.0.1} \
        --master_port=${MASTER_PORT:-46123} \
        evaluate.py --target_expansion 0.25 0.25 0.25 0.25 --eval_dir ./eval_dir/scenery/1x/ --size 128 \
                --config flickr192_large

You can outpaint images with arbitrary and continuous multiples by changing the target_expansion parameters. The four parameters mean (top, down, left, right).

Evaluation stage

We provide scripts to evaluate inception scores, FID, and Centered PSNR scores in the eval_dir. Run:

python eval_dir/inception.py --path ./path1/
python -m pytorch_fid ./path1/ ./path2/
python eval_dir/psnr.py --original ./ori_dir/ --contrast ./gen_dir/

Here are some generated samples:

⚡ Framework

Methodically, PQDiff can outpaint at any multiple in only one step, greatly increasing the applicability of image outpainting.

  • For training, we randomly crop the image twice with different random crop ratios to obtain two views. Then, we compute the relative positional embeddings of the anchor view (red box) and the target view (blue box).

  • For sampling, i.e. testing or generation, we first compute the target view (blue box) based on the anchor view (red box) to form a mode that means a positional relation. With different types of modes, we can perform arbitrary and controllable image outpainting.

👍 Acknowledgement

  • QueryOTR. The codebase provides image outpainting datasets and a strong baseline.
  • PQCL. The codebase inspires the position query scheme in this work.

📑 Citation

Please consider citing 📑 our papers if our repository is helpful to your work, thanks sincerely!

@misc{zhang2024continuousmultiple,
      title={Continuous-Multiple Image Outpainting in One-Step via Positional Query and A Diffusion-based Approach}, 
      author={Shaofeng Zhang and Jinfa Huang and Qiang Zhou and Zhibin Wang and Fan Wang and Jiebo Luo and Junchi Yan},
      year={2024},
      eprint={2401.15652},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}