Skip to content

This is a repo to track the latest autoregressive visual generation papers.

Notifications You must be signed in to change notification settings

lxa9867/Awesome-Autoregressive-Visual-Generation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 

Repository files navigation

Awesome-Autoregressive-Visual-Generation

This is a repo to track the latest autoregressive visual generation papers.

Image Tokenizers

  1. Neural Discrete Representation Learning Paper, NeurIPS 2017
  2. Generating Diverse High-Fidelity Images with VQ-VAE-2 Paper, NeurIPS 2019
  3. Taming Transformers for High-Resolution Image Synthesis Paper, CVPR 2021
  4. Autoregressive Image Generation using Residual Quantization Paper, CVPR 2022
  5. * BEIT V2: Masked Image Modeling with Vector-Quantized Visual Tokenizers (for understanding) Paper, Arxiv 2022
  6. Vector-quantized Image Modeling with Improved VQGAN Paper, ICLR 2022
  7. MoVQ: Modulating Quantized Vectors for High-Fidelity Image Generation Paper, NeurIPS 2022
  8. * PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers (for understanding) Paper, AAAI 2023
  9. * All in Tokens: Unifying Output Space of Visual Tasks via Soft Token (for understanding) Paper, CVPR 2023
  10. Regularized Vector Quantization for Tokenized Image Synthesis Paper, CVPR 2023
  11. Towards Accurate Image Coding: Improved Autoregressive Image Generation with Dynamic Vector Quantization Paper, CVPR 2023
  12. Not all image regionsmatter: Masked vector quantization for autoregressive image generation Paper, CVPR 2023
  13. Spae: Semantic pyramid autoencoder for multimodal generation with frozen llms Paper, NeurIPS 2023
  14. HQ-VAE: Hierarchical Discrete Representation Learning with Variational Bayes Paper, TMLR 2024
  15. Finite Scalar Quantization: VQ-VAE Made Simple Paper, ICLR 2024
  16. Planting a seed of vision in large language model Paper, ICLR 2024
  17. Language model beats diffusion–tokenizer is key to visual generation Paper, ICLR 2024
  18. Rethinking the Objectives of Vector-Quantized Tokenizers for Image Synthesis Paper, CVPR 2024
  19. Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction Paper, Arxiv 2024
  20. An Image is Worth 32 Tokens for Reconstruction and Generation Paper, Arxiv 2024
  21. Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of 99% Paper, Arxiv 2024
  22. Quantised Global Autoencoder: A Holistic Approach to Representing Visual Data Paper, Arxiv 2024
  23. VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation Paper, Arxiv 2024
  24. OPEN-MAGVIT2: AN OPEN-SOURCE PROJECT TOWARD DEMOCRATIZING AUTO-REGRESSIVE VISUAL GENERATION Paper, Arxiv 2024
  25. MaskBit: Embedding-free Image Generation via Bit Tokens Paper, Arxiv 2024

AutoRegressive Image Generation

  1. Conditional image generation with pixelcnn decoders Paper, NeurIPS 2016
  2. DiVAE : Photorealistic Images Synthesis with Denoising Diffusion Decoder Paper
  3. Vector Quantized Diffusion Model for Text-to-Image Synthesis Paper
  4. MaskGIT: Masked Generative Image Transformer Paper
  5. BEIT: BERT Pre-Training of Image Transformers Paper
  6. BEIT V2: Masked Image Modeling with Vector-Quantized Visual Tokenizers Paper
  7. MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis Paper
  8. Sequential modeling enables scalable learning for large vision models Paper, Arxiv 2023
  9. 4m: Massively multimodal masked modeling Paper, NeurIPS 2023
  10. Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation Paper, Arxiv 2024
  11. ControlVAR: Exploring Controllable Visual Autoregressive Modeling Paper, Arxiv 2024
  12. Autoregressive Image Generation without Vector Quantization Paper, Arxiv 2024
  13. MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis Paper, Arxiv 2024
  14. ANOLE: AnOpen,Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation Paper, Arxiv 2024
  15. VAR-CLIP: Text-to-Image Generator with Visual Auto-Regressive Modeling Paper, Arxiv 24
  16. Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining Paper, Arxiv 24
  17. Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model Paper, Arxiv 2024
  18. Scalable Autoregressive Image Generation with Mamba Paper, Arxiv 2024
  19. SHOW-O: ONE SINGLE TRANSFORMER TO UNIFY MULTIMODAL UNDERSTANDING AND GENERATION Paper, Arxiv 2024

About

This is a repo to track the latest autoregressive visual generation papers.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published