Causal Forcing

Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive Video Generation

Hongzhou Zhu*, Min Zhao* , Guande He, Hang Su, Chongxuan Li , Jun Zhu

Tsinghua University & Shengshu & UT Austin

Paper | Website | Models | WeChat

Causal Forcing significantly outperforms Self Forcing in both visual quality and motion dynamics, while keeping the same training budget and inference efficiency—enabling real-time, streaming video generation on a single RTX 4090.

demo.mp4

Quick Start

The inference environment is identical to Self Forcing, so you can migrate directly using our configs and model.

Installation

conda create -n causal_forcing python=3.10 -y
conda activate causal_forcing
pip install -r requirements.txt
pip install git+https://github.com/openai/CLIP.git
pip install flash-attn --no-build-isolation
python setup.py develop

Download checkpoints

hf download Wan-AI/Wan2.1-T2V-1.3B  --local-dir wan_models/Wan2.1-T2V-1.3B
hf download Wan-AI/Wan2.1-T2V-14B  --local-dir wan_models/Wan2.1-T2V-14B
hf download zhuhz22/Causal-Forcing chunkwise/causal_forcing.pt --local-dir checkpoints
hf download zhuhz22/Causal-Forcing framewise/causal_forcing.pt --local-dir checkpoints

CLI Inference

We open-source both the frame-wise and chunk-wise models; the former is a setting that Self Forcing has chosen not to release.

Frame-wise model (higher dynamic degree and more expressive):

python inference.py \
  --config_path configs/causal_forcing_dmd_framewise.yaml \
  --output_folder output/framewise \
  --checkpoint_path  checkpoints/framewise/causal_forcing.pt \
  --data_path prompts/demos.txt \
  --use_ema
    # Note: this frame-wise config not in Self Forcing; if using its framework, migrate this config too.

Chunk-wise model (more stable):

python inference.py \
  --config_path configs/causal_forcing_dmd_chunkwise.yaml \
  --output_folder output/chunkwise \
  --checkpoint_path checkpoints/chunkwise/causal_forcing.pt \
  --data_path prompts/demos.txt

Training

Stage 1: Autoregressive Diffusion Training (Can skip by using our pretrained checkpoints. Click to expand.)

First download the dataset (we provide a 6K toy dataset here):

hf download zhuhz22/Causal-Forcing-data  --local-dir dataset
python utils/merge_and_get_clean.py

If the download gets stuck, Ctrl^C and then resume it.

Then train the AR-diffusion model:

Framewise:

  torchrun --nnodes=8 --nproc_per_node=8 --rdzv_id=5235 \
  --rdzv_backend=c10d \
  --rdzv_endpoint $MASTER_ADDR \
  train.py \
  --config_path configs/ar_diffusion_tf_framewise.yaml \
  --logdir logs/ar_diffusion_framewise

Chunkwise:

  torchrun --nnodes=8 --nproc_per_node=8 --rdzv_id=5235 \
  --rdzv_backend=c10d \
  --rdzv_endpoint $MASTER_ADDR \
  train.py \
  --config_path configs/ar_diffusion_tf_chunkwise.yaml \
  --logdir logs/ar_diffusion_chunkwise

We recommend training no less than 2K steps, and more steps (e.g., 5~10K) will lead to better performance.

Stage 2: Causal ODE Initialization (Can skip by using our pretrained checkpoints. Click to expand.)

If you have skipped Stage 1, you need to download the pretrained models:

hf download zhuhz22/Causal-Forcing framewise/ar_diffusion.pt --local-dir checkpoints
hf download zhuhz22/Causal-Forcing chunkwise/ar_diffusion.pt --local-dir checkpoints

In this stage, first generate ODE paired data:

# for the frame-wise model
torchrun --nproc_per_node=8 \
  get_causal_ode_data_framewise.py \
  --generator_ckpt checkpoints/framewise/ar_diffusion.pt \
  --rawdata_path dataset/clean_data \
  --output_folder dataset/ODE6KCausal_framewise_latents

python utils/create_lmdb_iterative.py \
  --data_path dataset/ODE6KCausal_framewise_latents \
  --lmdb_path dataset/ODE6KCausal_framewise

# for the chunk-wise model
torchrun --nproc_per_node=8 \
  get_causal_ode_data_chunkwise.py \
  --generator_ckpt checkpoints/chunkwise/ar_diffusion.pt \
  --rawdata_path dataset/clean_data \
  --output_folder dataset/ODE6KCausal_chunkwise_latents

python utils/create_lmdb_iterative.py \
  --data_path dataset/ODE6KCausal_chunkwise_latents \
  --lmdb_path dataset/ODE6KCausal_chunkwise

Or you can also directly download our prepared dataset (~300G):

hf download zhuhz22/Causal-Forcing-data  --local-dir dataset
python utils/merge_lmdb.py

If the download gets stuck, Ctrl^C and then resume it.

And then train ODE initialization models:

Frame-wise:

torchrun --nnodes=8 --nproc_per_node=8 --rdzv_id=5235 \
  --rdzv_backend=c10d \
  --rdzv_endpoint $MASTER_ADDR \
  train.py \
  --config_path configs/causal_ode_framewise.yaml \
  --logdir logs/causal_ode_framewise

Chunk-wise:

torchrun --nnodes=8 --nproc_per_node=8 --rdzv_id=5235 \
  --rdzv_backend=c10d \
  --rdzv_endpoint $MASTER_ADDR \
  train.py \
  --config_path configs/causal_ode_chunkwise.yaml \
  --logdir logs/causal_ode_chunkwise

We recommend training no less than 1K steps, and more steps (e.g., 5~10K) will lead to better performance.

🔥 NEW: Substitute for Stage 2, without creating ODE paired data: Causal CD (Click to expand.)

Since creating ODE-paired data is very time-consuming, we also provide an alternative here that achieves the same effect as ODE distillation while requiring only ground-truth data.

Note: The current CD is still in an early stage, with many suboptimal implementations in both the algorithm and (especially) infra efficiency. We’ll continue iterating and improving it.

Frame-wise:

torchrun --nnodes=8 --nproc_per_node=8 --rdzv_id=5235 \
  --rdzv_backend=c10d \
  --rdzv_endpoint $MASTER_ADDR \
  train.py \
  --config_path configs/causal_cd_framewise.yaml \
  --logdir logs/causal_cd_framewise

Chunk-wise:

torchrun --nnodes=8 --nproc_per_node=8 --rdzv_id=5235 \
  --rdzv_backend=c10d \
  --rdzv_endpoint $MASTER_ADDR \
  train.py \
  --config_path configs/causal_cd_chunkwise.yaml \
  --logdir logs/causal_cd_chunkwise

We recommend training no less than 1K steps, and more steps (e.g., 3~5K) will lead to better performance.

Stage 3: DMD

This stage is compatible with Self Forcing training, so you can migrate seamlessly by using our configs and checkpoints.

Set your wandb configs before training.

First download the dataset:

hf download gdhe17/Self-Forcing vidprom_filtered_extended.txt --local-dir prompts

If you have skipped Stage 2, you need to download the pretrained checkpoints:

hf download zhuhz22/Causal-Forcing framewise/causal_ode.pt --local-dir checkpoints
hf download zhuhz22/Causal-Forcing chunkwise/causal_ode.pt --local-dir checkpoints

And then train DMD models:

Frame-wise model:

torchrun --nnodes=8 --nproc_per_node=8 --rdzv_id=5235 \
  --rdzv_backend=c10d \
  --rdzv_endpoint $MASTER_ADDR \
  train.py \
  --config_path configs/causal_forcing_dmd_framewise.yaml \
  --logdir logs/causal_forcing_dmd_framewise

We recommend training 500 steps. More than 1K steps will reduce dynamic degree.

Chunk-wise model:

torchrun --nnodes=8 --nproc_per_node=8 --rdzv_id=5235 \
  --rdzv_backend=c10d \
  --rdzv_endpoint $MASTER_ADDR \
  train.py \
  --config_path configs/causal_forcing_dmd_chunkwise.yaml \
  --logdir logs/causal_forcing_dmd_chunkwise

We recommend training 100~200 steps. More than 1K steps will reduce dynamic degree.

Such models are the final models used to generate videos.

QA & Blog

currently in Chinese

See here.

Acknowledgements

This codebase is built on top of the open-source implementation of CausVid, Self Forcing and the Wan2.1 repo.

References

If you find the method useful, please cite

@misc{zhu2026causalforcingautoregressivediffusion,
      title={Causal Forcing: Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive Video Generation}, 
      author={Hongzhou Zhu and Min Zhao and Guande He and Hang Su and Chongxuan Li and Jun Zhu},
      year={2026},
      eprint={2602.02214},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2602.02214}, 
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Causal Forcing

Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive Video Generation

Paper | Website | Models | WeChat

Quick Start

Installation

Download checkpoints

CLI Inference

Training

Stage 3: DMD

QA & Blog

Acknowledgements

References

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
assets		assets
configs		configs
demo_utils		demo_utils
model		model
pipeline		pipeline
prompts		prompts
trainer		trainer
utils		utils
wan		wan
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
demo.py		demo.py
get_causal_ode_data_chunkwise.py		get_causal_ode_data_chunkwise.py
get_causal_ode_data_framewise.py		get_causal_ode_data_framewise.py
inference.py		inference.py
requirements.txt		requirements.txt
setup.py		setup.py
train.py		train.py

License

thu-ml/Causal-Forcing

Folders and files

Latest commit

History

Repository files navigation

Causal Forcing

Autoregressive Diffusion Distillation Done Right for High-Quality Real-Time Interactive Video Generation

Paper | Website | Models | WeChat

Quick Start

Installation

Download checkpoints

CLI Inference

Training

Stage 3: DMD

QA & Blog

Acknowledgements

References

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages