Fix Z-Image VAE operations not reserving working memory for OOM prevention#8
Merged
lstein merged 7 commits intolstein-masterfrom Jan 5, 2026
Merged
Conversation
Co-authored-by: lstein <111189+lstein@users.noreply.github.com>
Co-authored-by: lstein <111189+lstein@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Fix out of memory errors with larger models
Fix Z-Image VAE operations not reserving working memory for OOM prevention
Jan 4, 2026
Owner
|
@copilot This error is occurring in the unit test: |
…utoEncoder Co-authored-by: lstein <111189+lstein@users.noreply.github.com>
Author
Fixed in commit e13401b. The issue was that |
Owner
|
@copilot Still failures on the python tests: |
…fields Co-authored-by: lstein <111189+lstein@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fix VRAM cache calculation to properly account for device_working_mem_gb
Problem Analysis
When generating with larger models (like Q8 Z-Image Turbo), users get OOM errors during VAE decoding.
Root Cause:
The
ZImageLatentsToImageInvocationandZImageImageToLatentsInvocationdo not request additional working memory for VAE operations, unlike the standard SD1.5/SDXL/SD3/CogView4 invocations. This means the model cache doesn't offload enough models from VRAM before VAE operations run, leaving no room for the operation's intermediate tensors.Comparison:
LatentsToImageInvocation(SD1.5/SDXL): Callsestimate_vae_working_memory_sd15_sdxl()and passesworking_mem_bytestomodel_on_device()SD3LatentsToImageInvocation: Callsestimate_vae_working_memory_sd3()and passesworking_mem_bytesCogView4LatentsToImageInvocation: Callsestimate_vae_working_memory_cogview4()and passesworking_mem_bytesZImageLatentsToImageInvocation: Didn't estimate or request working memory (NOW FIXED ✅)ZImageImageToLatentsInvocation: Didn't estimate or request working memory (NOW FIXED ✅)Changes Made
z_image_latents_to_image.pyandz_image_image_to_latents.pymodel_on_device(working_mem_bytes=...)model_construct()to bypass Pydantic validation for mock objectsTechnical Details
The fix adds working memory estimation to both Z-Image VAE invocations:
FluxAutoEncoder) or Diffusers (AutoencoderKL)estimate_vae_working_memory_flux()for FLUX VAEestimate_vae_working_memory_sd3()for AutoencoderKLmodel_on_device(working_mem_bytes=...)This ensures the model cache properly offloads models to make room for VAE operations before they run, preventing OOM errors.
Test Fixes
configattribute was being set on FluxAutoEncoder mock, which doesn't have this attribute. The test now only setsconfigattributes for AutoencoderKL VAEs.model_construct()instead of the regular constructor to create invocation instances with mock fields, bypassing validation while still testing the core logic.Files Modified
invokeai/app/invocations/z_image_latents_to_image.py: Added working memory estimation for decodeinvokeai/app/invocations/z_image_image_to_latents.py: Added working memory estimation for encodetests/app/invocations/test_z_image_working_memory.py: Added tests to verify working memory estimationExpected Impact
Users will no longer need to manually set
max_cache_vram_gbto work around OOM errors. Thedevice_working_mem_gbsetting (default 3GB) will now work correctly for Z-Image models, as the VAE operations will request appropriate working memory and the model cache will offload models accordingly.Original prompt
This section details on the original issue you should resolve
<issue_title>[bug]: Out of Memory errors with larger models</issue_title>
<issue_description>### Is there an existing issue for this problem?
Install method
Invoke's Launcher
Operating system
Linux
GPU vendor
Nvidia (CUDA)
GPU model
RTX 4070
GPU VRAM
12GB
Version number
6.10.0rc2
Browser
No response
System Information
No response
What happened
When generating with the Q8 Z-Image Turbo model, I am getting out of memory errors during the VAE decoding phase. I can avoid the errors by setting
max_cache_vram_gbto 4 GB, at which point I see VRAM memory use rise to ~4 GB. However it doesn't seem intuitive to me that adjusting the VRAM cache should be the way to fix the error.I also tried setting
device_working_mem_gb: 4in my config file, but without the VRAM cache setting, I again get OOM.Here is the log from a successful generation with the VRAM cache limited to 4 GB: