v0.30.3: CogVideoX Image-to-Video and Video-to-Video
This patch release adds Diffusers support for the upcoming CogVideoX-5B-I2V release (an Image-to-Video generation model)! The model weights will be available by end of the week on the HF Hub at THUDM/CogVideoX-5b-I2V
(Link). Stay tuned for the release!
This release features two new pipelines:
- CogVideoXImageToVideoPipeline
- CogVideoXVideoToVideoPipeline
Additionally, we now have support for tiled encoding in the CogVideoX VAE. This can be enabled by calling the vae.enable_tiling()
method, and it is used in the new Video-to-Video pipeline to encode sample videos to latents in a memory-efficient manner.
CogVideoXImageToVideoPipeline
The code below demonstrates how to use the new image-to-video pipeline:
import torch
from diffusers import CogVideoXImageToVideoPipeline
from diffusers.utils import export_to_video, load_image
pipe = CogVideoXImageToVideoPipeline.from_pretrained("THUDM/CogVideoX-5b-I2V", torch_dtype=torch.bfloat16)
pipe.to("cuda")
# Optionally, enable memory optimizations.
# If enabling CPU offloading, remember to remove `pipe.to("cuda")` above
pipe.enable_model_cpu_offload()
pipe.vae.enable_tiling()
prompt = "An astronaut hatching from an egg, on the surface of the moon, the darkness and depth of space realised in the background. High quality, ultrarealistic detail and breath-taking movie-like camera shot."
image = load_image(
"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/astronaut.jpg"
)
video = pipe(image, prompt, use_dynamic_cfg=True)
export_to_video(video.frames[0], "output.mp4", fps=8)
CogVideoXImageToVideoExample.mp4 |
CogVideoXVideoToVideoPipeline
The code below demonstrates how to use the new video-to-video pipeline:
import torch
from diffusers import CogVideoXDPMScheduler, CogVideoXVideoToVideoPipeline
from diffusers.utils import export_to_video, load_video
# Models: "THUDM/CogVideoX-2b" or "THUDM/CogVideoX-5b"
pipe = CogVideoXVideoToVideoPipeline.from_pretrained("THUDM/CogVideoX-5b-trial", torch_dtype=torch.bfloat16)
pipe.scheduler = CogVideoXDPMScheduler.from_config(pipe.scheduler.config)
pipe.to("cuda")
input_video = load_video(
"https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/hiker.mp4"
)
prompt = (
"An astronaut stands triumphantly at the peak of a towering mountain. Panorama of rugged peaks and "
"valleys. Very futuristic vibe and animated aesthetic. Highlights of purple and golden colors in "
"the scene. The sky is looks like an animated/cartoonish dream of galaxies, nebulae, stars, planets, "
"moons, but the remainder of the scene is mostly realistic."
)
video = pipe(
video=input_video, prompt=prompt, strength=0.8, guidance_scale=6, num_inference_steps=50
).frames[0]
export_to_video(video, "output.mp4", fps=8)
CogVideoXVideoToVideoExample.mp4 |
Shoutout to @tin2tin for the awesome demonstration!
Refer to our documentation to learn more about it.
All commits
- [core] Support VideoToVideo with CogVideoX by @a-r-r-o-w in #9333
- [core] CogVideoX memory optimizations in VAE encode by @a-r-r-o-w in #9340
- [CI] Quick fix for Cog Video Test by @DN6 in #9373
- [refactor] move positional embeddings to patch embed layer for CogVideoX by @a-r-r-o-w in #9263
- CogVideoX-5b-I2V support by @zRzRzRzRzRzRzR in #9418