Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

多线程推理出现错误 #1044

Open
Daniel-He-KX opened this issue Mar 6, 2025 · 4 comments
Open

多线程推理出现错误 #1044

Daniel-He-KX opened this issue Mar 6, 2025 · 4 comments

Comments

@Daniel-He-KX
Copy link

Exception in thread Thread-19 (llm_job):
Traceback (most recent call last):
File "/root/miniconda3/envs/cosyvoice/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/root/miniconda3/envs/cosyvoice/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/root/CosyVoice/cosyvoice/cli/model.py", line 113, in llm_job
for i in self.llm.inference(text=text.to(self.device),
File "/root/miniconda3/envs/cosyvoice/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 56, in generator_context
response = gen.send(request)
File "/root/CosyVoice/cosyvoice/llm/llm.py", line 326, in inference
top_ids = self.sampling_ids(logp.squeeze(dim=0), out_tokens, sampling, ignore_eos=True if i < min_len else False).item()
File "/root/CosyVoice/cosyvoice/llm/llm.py", line 150, in sampling_ids
top_ids = self.sampling(weighted_scores, decoded_tokens, sampling)
File "/root/CosyVoice/cosyvoice/utils/common.py", line 110, in ras_sampling
top_ids = nucleus_sampling(weighted_scores, top_p=top_p, top_k=top_k)
File "/root/CosyVoice/cosyvoice/utils/common.py", line 131, in nucleus_sampling
top_ids = indices[prob.multinomial(1, replacement=True)]
RuntimeError: probability tensor contains either inf, nan or element < 0

@Daniel-He-KX
Copy link
Author

用的下面的模式: self.cosyvoice = CosyVoice2( cosy_voice_path + 'pretrained_models/CosyVoice2-0.5B', load_jit=False, load_trt=True, fp16=True ) load_trt = true 多线程出现错误, load_trt 加载 TensorRT 优化的模型

@Daniel-He-KX
Copy link
Author

补充一下用的A10

@aluminumbox
Copy link
Collaborator

应该是模型不太鲁邦,试试fp16=False

@Daniel-He-KX
Copy link
Author

好 ,我试试,同一台机器 跑了2个python程序, GPU 利用率已经100%, 但是显存才用到10021MiB / 23028MiB 还有一半,我想如何吃点剩下一半?另外 cpu 8核 已经快不行了,是不是不能在增加进程了?如下图2

Every 1.0s: nvidia-smi iZbp14v9nxa3mr9n1tu44pZ: Thu Mar 6 15:37:56 2025

Thu Mar 6 15:37:56 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.127.08 Driver Version: 550.127.08 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA A10 On | 00000000:00:07.0 Off | 0 |
| 0% 70C P0 129W / 150W | 10021MiB / 23028MiB | 100% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 241414 C python 5020MiB |
| 0 N/A N/A 241433 C python 4992MiB |
+-----------------------------------------------------------------------------------------+

top - 15:40:25 up 15 days, 5:02, 6 users, load average: 7.45, 7.24, 5.97
Tasks: 199 total, 1 running, 198 sleeping, 0 stopped, 0 zombie
%Cpu0 : 88.8 us, 3.4 sy, 0.0 ni, 7.8 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu1 : 92.3 us, 2.7 sy, 0.0 ni, 5.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu2 : 96.6 us, 1.3 sy, 0.0 ni, 2.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu3 : 93.6 us, 1.3 sy, 0.0 ni, 5.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu4 : 97.3 us, 1.0 sy, 0.0 ni, 1.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu5 : 93.3 us, 2.0 sy, 0.0 ni, 4.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu6 : 89.6 us, 2.0 sy, 0.0 ni, 8.4 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
%Cpu7 : 88.7 us, 3.8 sy, 0.0 ni, 7.5 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
MiB Mem : 29692.4 total, 5066.4 free, 15432.9 used, 9193.1 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 13776.4 avail Mem

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants