当加载了llm模型Qwen1.5-14B-Chat-GPTQ-int4后，就不能加载m3e-base向量模型了 #1848

worm128 · 2024-07-11T13:54:40Z

错误截图：

错误：
2024-07-11 21:53:09 RuntimeError: [address=0.0.0.0:54846, pid=45] User specified GPU index 0 has been occupied with a vLLM model: Qwen1.5-14B-Chat-GPTQ-int4-1-0, therefore cannot allocate GPU memory for a new model.

操作：当加载了llm模型Qwen1.5-14B-Chat-GPTQ-int4后，就不能加载m3e-base向量模型了

欢迎大家加q群讨论一下：27831318

ChengjieLi28 · 2024-07-12T02:28:33Z

不要开重复的issue。vllm默认会几乎吃满显存，再分配模型就无法分配显存了。

firrice · 2025-02-10T06:58:09Z

不要开重复的issue。vllm默认会几乎吃满显存，再分配模型就无法分配显存了。

但是可以通过gpu_memory_utilization指定吧，而且单卡上的多余闲置显存没办法用来部署emebdding模型不是很浪费么

worm128 added the feature label Jul 11, 2024

XprobeBot added gpu and removed feature labels Jul 11, 2024

XprobeBot added this to the v0.13.1 milestone Jul 11, 2024

ChengjieLi28 closed this as not planned Won't fix, can't repro, duplicate, stale Jul 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

当加载了llm模型Qwen1.5-14B-Chat-GPTQ-int4后，就不能加载m3e-base向量模型了 #1848

当加载了llm模型Qwen1.5-14B-Chat-GPTQ-int4后，就不能加载m3e-base向量模型了 #1848

worm128 commented Jul 11, 2024 •

edited

Loading

ChengjieLi28 commented Jul 12, 2024

firrice commented Feb 10, 2025

当加载了llm模型Qwen1.5-14B-Chat-GPTQ-int4后，就不能加载m3e-base向量模型了 #1848

当加载了llm模型Qwen1.5-14B-Chat-GPTQ-int4后，就不能加载m3e-base向量模型了 #1848

Comments

worm128 commented Jul 11, 2024 • edited Loading

ChengjieLi28 commented Jul 12, 2024

firrice commented Feb 10, 2025

worm128 commented Jul 11, 2024 •

edited

Loading