-
Notifications
You must be signed in to change notification settings - Fork 504
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enables compatibility between diffusers CPU offloading and xFuser paralleism #147
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Running:
produces the following video: sample_video_pr_147.mp4@feifeibear any suggestions what is going on? Did you get correct output @BBuf? Thank you! 🙏 |
You can try turning off Offload. If the problem still persists, it indicates an issue with the model itself - it cannot properly generate videos at the 624x832 resolution. |
@BBuf thanks for the quick reply, I see. I was just following the "Supported Parallel Configurations" listed in here which indicates that Without cpu offloading, produces OOM error:
|
@BBuf what parameters did you try (assuming you got sensible output video)? |
I try follow command in A800 node: torchrun --nproc_per_node=8 sample_video.py --video-size 720 1280 --video-length 129 --infer-steps 30 --prompt "A cute rabbit family eating dinner in their burrow." --use-cpu-offload --flow-reverse --save-path ./results --ring-degree 4 --ulysses-degree 2 --seed 42 And the result is normal: |
Same thing with |
The previous incompatibility was caused by diffusers not being aware of the local rank in distributed environments, which made it always assume it was rank 0. This led to the
model.to(device)
call at line 1174 in pipeline_utils.py constantly copying the DiT model from other ranks to rank 0, causing memory OOM issues.The bug was fixed by passing the device corresponding to the local_rank to
pipeline.enable_sequential_cpu_offload
. As a result, diffusers' CPU offloading and xFuser parallelization can now be used together.