-
Notifications
You must be signed in to change notification settings - Fork 10.9k
dequantization offload accounting (fixes Flux2 OOMs - incl TEs) #11171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dequantization offload accounting (fixes Flux2 OOMs - incl TEs) #11171
Conversation
|
Confirmed that it resolves #10891 (comment). |
comfy/sd.py
Outdated
|
|
||
| self.tokenizer = tokenizer(embedding_directory=embedding_directory, tokenizer_data=tokenizer_data) | ||
| self.patcher = comfy.model_patcher.ModelPatcher(self.cond_stage_model, load_device=load_device, offload_device=offload_device) | ||
| self.patcher.set_model_compute_dtype(dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is slightly wrong because the text encoder is always upcasted to fp32.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed to match:
--- a/comfy/sd.py
+++ b/comfy/sd.py
@@ -127,7 +127,8 @@ class CLIP:
self.tokenizer = tokenizer(embedding_directory=embedding_directory, tokenizer_data=tokenizer_data)
self.patcher = comfy.model_patcher.ModelPatcher(self.cond_stage_model, load_device=load_device, offload_device=offload_device)
- self.patcher.set_model_compute_dtype(dtype)
+ #Match torch.float32 hardcode upcast in TE implemention
+ self.patcher.set_model_compute_dtype(torch.float32)
self.patcher.hook_mode = comfy.hooks.EnumHookMode.MinVram
self.patcher.is_clip = True
self.apply_hooks_to_conds = None
Handle the case where the attribute doesnt exist by returning a static sentinel (distinct from None). If the sentinel is passed in as the set value, del the attr.
When measuring the cost of offload, identify weights that need a type change or dequantization and add the size of the conversion result to the offload cost. This is mutually exclusive with lowvram patches which already has a large conservative estimate and wont overlap the dequant cost so\ dont double count.
So that the loader can know the size of weights for dequant accounting.
a03d98e to
9663183
Compare
I get this when running the basic hidream dev workflow on: https://comfyanonymous.github.io/ComfyUI_examples/hidream/ with simulated 16GB vram. |
|
Nevermind I think it was my fault: #11201 |
This is the hopefully full root cause fix on:
#10891
Primary commit message:
Example Test case:
RTX3060 Flux2 workflow with ModelComputeDtype node set to fp32
Before:
After: