Unload the LLM from VRAM after each call? #3

Pdonor · 2024-08-11T22:11:15Z

Hi! With the new version of Forge, and FLUX, this extension could be really practical for the millions of low VRAM laptops that can now run FLUX. The only problem is that it doesn't unload the LLM from VRAM when using Ollama, so the generation is way too slow.

According to ollama/ollama#1600 , that can be accomplished with
'''
curl http://localhost:11434/api/generate -d '{"model": "llama2", "keep_alive": 0}'
'''
Can that be put in your code?

Also, could it be set to store a different system prompt and ollama settings? I found that giving it an example in the system prompt works well.

Basically, it seems you are a few lines of code away from the best 'magic prompt' software in the world, surpassing the ones on Dalle-3 and Ideogram, which are censored. Thank you!

kmdtukl · 2024-08-15T07:06:48Z

add to environment variable OLLAMA_KEEP_ALIVE 0

xlinx · 2024-08-16T11:49:32Z

okie, let me try try unload; is these actions that u want:

active generate forever
calling LLM
LLM answer
unload LLM save VRAM. by (http://localhost:11434/api/generate -d '{"model": "llama2", "keep_alive": 0}' ' ')
sd-web-ui working
sd finished.
re-call step 1.

( i use 4060ti 16g vram, so i usually load 7B LLM with SDXL is fine for me.)

is action like this?
more addtional call each LLM-call?

BTW, if active web-ui [generate forever].
u can consider use another one extension whic can send ur fantastic LLM sd-result to IM app. review like comic book on ur mobile phone. its fun.
https://github.com/xlinx/sd-webui-decadetw-auto-messaging-realtime

xlinx added the good first issue Good for newcomers label Sep 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unload the LLM from VRAM after each call? #3

Unload the LLM from VRAM after each call? #3

Pdonor commented Aug 11, 2024 •

edited

Loading

kmdtukl commented Aug 15, 2024

xlinx commented Aug 16, 2024 •

edited

Loading

Unload the LLM from VRAM after each call? #3

Unload the LLM from VRAM after each call? #3

Comments

Pdonor commented Aug 11, 2024 • edited Loading

kmdtukl commented Aug 15, 2024

xlinx commented Aug 16, 2024 • edited Loading

Pdonor commented Aug 11, 2024 •

edited

Loading

xlinx commented Aug 16, 2024 •

edited

Loading