We maintain this repository to summarize papers and resources related to the text-to-video (T2V) generation task.
In reference.bib
, we will summarize and update the bibtex references of up-to-date T2V papers, as well as widely used datasets and toolkits.
If you have any suggestions about this repository, please feel free to start a new issue or pull requests.
-
Emu [website]
-
Gen-2 [website]
-
Midjourney [website]
-
Morph Studio [website]
-
Outfit Anyone [website]
-
Pika [website]
-
PixelDance [website]
-
VideoPoet [website]
[arXiv 2023] Animate Anyone: Consistent and Controllable Image-to-video Synthesis for Character Animation [paper] [code] [project]
[arXiv 2023] AnimateDiff: Animate Your Personalized Text-to-image Diffusion Models without Specific Tuning [paper] [project]
[arXiv 2023] Control-A-Video: Controllable Text-to-video Generation with Diffusion Models [paper] [code] [demo] [project]
[arXiv 2023] ControlVideo: Training-free Controllable Text-to-video Generation [paper] [code]
[arXiv 2023] I2VGen-XL: High-quality Image-to-video Synthesis via Cascaded Diffusion Models [paper] [code] [project]
[arXiv 2023] Imagen Video: High Definition Video Generation with Diffusion Models [paper]
[arXiv 2023] Latent-Shift: Latent Diffusion with Temporal Shift for Efficient Text-to-video Generation [paper] [project]
[arXiv 2023] LAVIE: High-quality Video Generation with Cascaded Latent Diffusion Models [paper] [code] [project]
[arXiv 2023] Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-video Generation [paper] [code] [project]
[arXiv 2023] SimDA: Simple Diffusion Adapter for Efficient Video Generation [paper] [code] [project]
[arXiv 2023] Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets [paper] [code] [project]
[arXiv 2023] Style-A-Video: Agile Diffusion for Arbitrary Text-based Video Style Transfer [paper]
[arXiv 2023] VideoComposer: Compositional Video Synthesis with Motion Controllability [paper] [code] [project]
[arXiv 2023] VideoFactory: Swap Attention in Spatiotemporal Diffusions for Text-to-video Generation [paper]
[arXiv 2023] VideoGen: A Reference-guided Latent Diffusion Approach for High Definition Text-to-video Generation [paper] [code]
[CVPR 2023] Align your Latents: High-resolution Video Synthesis with Latent Diffusion Models [paper] [project] [reproduced code]
[CVPR 2023] Text2Video-Zero: Text-to-image Diffusion Models are Zero-shot Video Generators [paper] [code] [demo] [project]
[CVPR 2023] Video Probabilistic Diffusion Models in Projected Latent Space [paper] [code]
[NeurIPS 2023] Video Diffusion Models [paper] [project]
[ICCV 2023] Preserve Your Own Correlation: A Noise Prior for Video Diffusion Models [paper] [project]
[ICCV 2023] Structure and Content-guided Video Synthesis with Diffusion Models [paper] [project]
[ICLR 2023] CogVideo: Large-scale Pretraining for Text-to-video Generation via Transformers [paper] [code] [demo]
[ICLR 2023] Make-A-Video: Text-to-video Generation without Text-video Data [paper] [project] [reproduced code]
[ICLR 2023] Phenaki: Variable Length Video Generation From Open Domain Textual Description [paper] [code]