-
|
I'm trying to run qwen image on 9070xt (16 gb vram total, 15 actually available) This setup requires 20gb total and it seems that Vulkan impl (release b314d80)) is trying to allocate the whole 20gb in gpu vram and it fails with .\sd-cli --diffusion-model "D:\Downloads\qwen-image-Q5_K_M.gguf" --vae "D:\Downloads\qwen_image_vae.safetensors" --llm "D:\Downloads\Qwen2.5-VL-7B-Instruct-UD-Q4_K_XL.gguf" -o "stablediffusioncpp/cat.png" -p "a lovely cat" --offload-to-cpuSuspicious part is Another question I have: would it be possible to use second gpu for the LLM part? I have dual setup with an older 1070 with 8gb VRAM and |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
|
Note that 2179989504 is 2079 MB. Maybe this error was triggered because the VRAM filled up, not because of a single allocation. The inference also needs a working buffer: if you have a ~14G model, and ~15G available VRAM, the ~1G left may not be enough for it. I suggest trying
That message doesn't really reflect
Not on sd.cpp mainline, but there is a PR that implements it: #1184 |
Beta Was this translation helpful? Give feedback.
Note that 2179989504 is 2079 MB. Maybe this error was triggered because the VRAM filled up, not because of a single allocation. The inference also needs a working buffer: if you have a ~14G model, and ~15G available VRAM, the ~1G left may not be enough for it. I suggest trying
--diffusion-fa, and/or a smaller quant.That message doesn't really reflect
--offload-to-cpubehavior: it shows the total memory for all models during inference. RAM is 0 in this case because you didn't use--vae-on-cpunor--clip-on-cpu. It's basically "what you would need withoutoffload-to-cpu".--offload-to-cpukeeps the model we…