You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
错误截图:
模型:Qwen1.5-14B-Chat-GPTQ-int4
加载引擎:vllm
错误信息:
torch.cuda.OutOfMemoryError: [address=0.0.0.0:43411, pid=101] CUDA out of memory. Tried to allocate 70.00 MiB. GPU
2024-07-11 21:46:49 INFO 07-11 13:46:49 model_runner.py:854] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.
2024-07-11 21:46:49 INFO 07-11 13:46:49 model_runner.py:858] CUDA graphs can take additional 1~3 GiB memory per GPU. If you are running out of memory, consider decreasing gpu_memory_utilization or enforcing eager mode. You can also reduce the max_num_seqs as needed to decrease memory usage.
错误截图:

模型:Qwen1.5-14B-Chat-GPTQ-int4
加载引擎:vllm
错误信息:
torch.cuda.OutOfMemoryError: [address=0.0.0.0:43411, pid=101] CUDA out of memory. Tried to allocate 70.00 MiB. GPU
2024-07-11 21:46:49 INFO 07-11 13:46:49 model_runner.py:854] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.
2024-07-11 21:46:49 INFO 07-11 13:46:49 model_runner.py:858] CUDA graphs can take additional 1~3 GiB memory per GPU. If you are running out of memory, consider decreasing
gpu_memory_utilization
or enforcing eager mode. You can also reduce themax_num_seqs
as needed to decrease memory usage.说明:
我这边是24G的显卡,显存还剩10G就告诉我显存不够,也不知道是显存碎片化问题还是什么,反正这个问题经常出现在新版0.12.3,特别是在加载了m3e-base向量模型后,在加载llm模型,就会容易报错,旧版0.8.5加载模型比较文档,不会出现这些错误。
欢迎大家加q群讨论一下:27831318
The text was updated successfully, but these errors were encountered: