Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unload the LLM from VRAM after each call? #3

Open
Pdonor opened this issue Aug 11, 2024 · 2 comments
Open

Unload the LLM from VRAM after each call? #3

Pdonor opened this issue Aug 11, 2024 · 2 comments
Labels
good first issue Good for newcomers

Comments

@Pdonor
Copy link

Pdonor commented Aug 11, 2024

Hi! With the new version of Forge, and FLUX, this extension could be really practical for the millions of low VRAM laptops that can now run FLUX. The only problem is that it doesn't unload the LLM from VRAM when using Ollama, so the generation is way too slow.

According to ollama/ollama#1600 , that can be accomplished with
'''
curl http://localhost:11434/api/generate -d '{"model": "llama2", "keep_alive": 0}'
'''
Can that be put in your code?

Also, could it be set to store a different system prompt and ollama settings? I found that giving it an example in the system prompt works well.

Basically, it seems you are a few lines of code away from the best 'magic prompt' software in the world, surpassing the ones on Dalle-3 and Ideogram, which are censored. Thank you!

@kmdtukl
Copy link

kmdtukl commented Aug 15, 2024

add to environment variable OLLAMA_KEEP_ALIVE 0

@xlinx
Copy link
Owner

xlinx commented Aug 16, 2024

okie, let me try try unload; is these actions that u want:

  1. active generate forever
  2. calling LLM
  3. LLM answer
  4. unload LLM save VRAM. by (http://localhost:11434/api/generate -d '{"model": "llama2", "keep_alive": 0}' ' ')
  5. sd-web-ui working
  6. sd finished.
  7. re-call step 1.

( i use 4060ti 16g vram, so i usually load 7B LLM with SDXL is fine for me.)

  • is action like this?
  • more addtional call each LLM-call?

螢幕擷取畫面 2024-08-17 040151

BTW, if active web-ui [generate forever].
u can consider use another one extension whic can send ur fantastic LLM sd-result to IM app. review like comic book on ur mobile phone. its fun.
https://github.com/xlinx/sd-webui-decadetw-auto-messaging-realtime

@xlinx xlinx added the good first issue Good for newcomers label Sep 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants