Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

当加载了llm模型Qwen1.5-14B-Chat-GPTQ-int4后,就不能加载m3e-base向量模型了 #1848

Closed
worm128 opened this issue Jul 11, 2024 · 2 comments
Labels
Milestone

Comments

@worm128
Copy link

worm128 commented Jul 11, 2024

错误截图:
图片

错误:
2024-07-11 21:53:09 RuntimeError: [address=0.0.0.0:54846, pid=45] User specified GPU index 0 has been occupied with a vLLM model: Qwen1.5-14B-Chat-GPTQ-int4-1-0, therefore cannot allocate GPU memory for a new model.

操作:当加载了llm模型Qwen1.5-14B-Chat-GPTQ-int4后,就不能加载m3e-base向量模型了

欢迎大家加q群讨论一下:27831318

@XprobeBot XprobeBot added gpu and removed feature labels Jul 11, 2024
@XprobeBot XprobeBot added this to the v0.13.1 milestone Jul 11, 2024
@ChengjieLi28
Copy link
Contributor

不要开重复的issue。vllm默认会几乎吃满显存,再分配模型就无法分配显存了。

@ChengjieLi28 ChengjieLi28 closed this as not planned Won't fix, can't repro, duplicate, stale Jul 12, 2024
@firrice
Copy link

firrice commented Feb 10, 2025

不要开重复的issue。vllm默认会几乎吃满显存,再分配模型就无法分配显存了。

但是可以通过gpu_memory_utilization指定吧,而且单卡上的多余闲置显存没办法用来部署emebdding模型不是很浪费么

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants