Release v0.30.3: CogVideoX Image-to-Video and Video-to-Video · huggingface/diffusers

This patch release adds Diffusers support for the upcoming CogVideoX-5B-I2V release (an Image-to-Video generation model)! The model weights will be available by end of the week on the HF Hub at THUDM/CogVideoX-5b-I2V (Link). Stay tuned for the release!

This release features two new pipelines:

CogVideoXImageToVideoPipeline
CogVideoXVideoToVideoPipeline

Additionally, we now have support for tiled encoding in the CogVideoX VAE. This can be enabled by calling the vae.enable_tiling() method, and it is used in the new Video-to-Video pipeline to encode sample videos to latents in a memory-efficient manner.

CogVideoXImageToVideoPipeline

The code below demonstrates how to use the new image-to-video pipeline:

import torch
from diffusers import CogVideoXImageToVideoPipeline
from diffusers.utils import export_to_video, load_image

pipe = CogVideoXImageToVideoPipeline.from_pretrained("THUDM/CogVideoX-5b-I2V", torch_dtype=torch.bfloat16)
pipe.to("cuda")

# Optionally, enable memory optimizations.
# If enabling CPU offloading, remember to remove `pipe.to("cuda")` above
pipe.enable_model_cpu_offload()
pipe.vae.enable_tiling()

prompt = "An astronaut hatching from an egg, on the surface of the moon, the darkness and depth of space realised in the background. High quality, ultrarealistic detail and breath-taking movie-like camera shot."
image = load_image(
    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/astronaut.jpg"
)
video = pipe(image, prompt, use_dynamic_cfg=True)
export_to_video(video.frames[0], "output.mp4", fps=8)

CogVideoXImageToVideoExample.mp4

CogVideoXVideoToVideoPipeline

The code below demonstrates how to use the new video-to-video pipeline:

import torch
from diffusers import CogVideoXDPMScheduler, CogVideoXVideoToVideoPipeline
from diffusers.utils import export_to_video, load_video

# Models: "THUDM/CogVideoX-2b" or "THUDM/CogVideoX-5b"
pipe = CogVideoXVideoToVideoPipeline.from_pretrained("THUDM/CogVideoX-5b-trial", torch_dtype=torch.bfloat16)
pipe.scheduler = CogVideoXDPMScheduler.from_config(pipe.scheduler.config)
pipe.to("cuda")

input_video = load_video(
    "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/hiker.mp4"
)
prompt = (
    "An astronaut stands triumphantly at the peak of a towering mountain. Panorama of rugged peaks and "
    "valleys. Very futuristic vibe and animated aesthetic. Highlights of purple and golden colors in "
    "the scene. The sky is looks like an animated/cartoonish dream of galaxies, nebulae, stars, planets, "
    "moons, but the remainder of the scene is mostly realistic."
)

video = pipe(
    video=input_video, prompt=prompt, strength=0.8, guidance_scale=6, num_inference_steps=50
).frames[0]
export_to_video(video, "output.mp4", fps=8)

CogVideoXVideoToVideoExample.mp4

Shoutout to @tin2tin for the awesome demonstration!

Refer to our documentation to learn more about it.

All commits

[core] Support VideoToVideo with CogVideoX by @a-r-r-o-w in #9333
[core] CogVideoX memory optimizations in VAE encode by @a-r-r-o-w in #9340
[CI] Quick fix for Cog Video Test by @DN6 in #9373
[refactor] move positional embeddings to patch embed layer for CogVideoX by @a-r-r-o-w in #9263
CogVideoX-5b-I2V support by @zRzRzRzRzRzRzR in #9418

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.30.3: CogVideoX Image-to-Video and Video-to-Video

CogVideoXImageToVideoPipeline

CogVideoXVideoToVideoPipeline

All commits

Contributors