Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

部署deepseek-coder-6.7b-instruct 并用oneapi代理,使用openai调用报错 #1845

Closed
1 of 3 tasks
xxch opened this issue Jul 11, 2024 · 2 comments
Closed
1 of 3 tasks
Milestone

Comments

@xxch
Copy link

xxch commented Jul 11, 2024

System Info / 系統信息

ubuntu :20.04
python:3.10
transformers: 4.42.3
llama-cpp-python:0.2.78

Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?

  • docker / docker
  • pip install / 通过 pip install 安装
  • installation from source / 从源码安装

Version info / 版本信息

Xinfernece:0.12.3

The command used to start Xinference / 用以启动 xinference 的命令

命令启动:
XINFERENCE_TRANSFORMERS_ENABLE_BATCHING=1 nohup xinference-local --host 0.0.0.0 --port 9997 > xinference.log &

Reproduction / 复现过程

在 Xinfernece中启动deepseek-coder-6.7b-instruct 正常,在对话框中也可以正常使用。通过oneapi代理后,使用openai的方式进行访问就报错。
同样使用oneapi代理的Qwen2-57B是可以正常调用的。
错误描述:2024-07-11 16:12:39,705 xinference.model.llm.pytorch.utils 1165578 ERROR Internal error for batch inference: <xinference.core.scheduler.InferenceRequest object at 0x748422df09d0>.
Traceback (most recent call last):
File "/app/miniconda/envs/inference/lib/python3.11/site-packages/xinference/model/llm/pytorch/utils.py", line 770, in batch_inference_one_step
_batch_inference_one_step_internal(
File "/app/miniconda/envs/inference/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/app/miniconda/envs/inference/lib/python3.11/site-packages/xinference/model/llm/pytorch/utils.py", line 733, in _batch_inference_one_step_internal
invalid_token_num = decode_round - stop_token_mapping[r]
~~~~~~~~~~~~~~~~~~^^^
KeyError: <xinference.core.scheduler.InferenceRequest object at 0x748422df09d0>
2024-07-11 16:12:39,710 xinference.api.restful_api 172742 ERROR [address=0.0.0.0:38867, pid=1165578] <xinference.core.scheduler.InferenceRequest object at 0x748422df09d0>
Traceback (most recent call last):
File "/app/miniconda/envs/inference/lib/python3.11/site-packages/xinference/api/restful_api.py", line 1566, in create_chat_completion
data = await model.chat(
^^^^^^^^^^^^^^^^^
File "/app/miniconda/envs/inference/lib/python3.11/site-packages/xoscar/backends/context.py", line 227, in send
return self._process_result_message(result)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/miniconda/envs/inference/lib/python3.11/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
raise message.as_instanceof_cause()
File "/app/miniconda/envs/inference/lib/python3.11/site-packages/xoscar/backends/pool.py", line 659, in send
result = await self._run_coro(message.message_id, coro)
^^^^^^^^^^^^^^^^^
File "/app/miniconda/envs/inference/lib/python3.11/site-packages/xoscar/backends/pool.py", line 370, in _run_coro
return await coro
File "/app/miniconda/envs/inference/lib/python3.11/site-packages/xoscar/api.py", line 384, in on_receive
return await super().on_receive(message) # type: ignore
^^^^^^^^^^^^^^^^^
File "xoscar/core.pyx", line 558, in on_receive
raise ex
File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive
async with self._lock:
^^^^^^^^^^^^^^^^^
File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive
with debug_async_timeout('actor_lock_timeout',
^^^^^^^^^^^^^^^^^
File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.on_receive
result = await result
^^^^^^^^^^^^^^^^^
File "/app/miniconda/envs/inference/lib/python3.11/site-packages/xinference/core/utils.py", line 45, in wrapped
ret = await func(*args, **kwargs)
^^^^^^^^^^^^^^^^^
File "/app/miniconda/envs/inference/lib/python3.11/site-packages/xinference/core/model.py", line 87, in wrapped_func
ret = await fn(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^
File "/app/miniconda/envs/inference/lib/python3.11/site-packages/xoscar/api.py", line 462, in _wrapper
r = await func(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^
File "/app/miniconda/envs/inference/lib/python3.11/site-packages/xinference/core/model.py", line 482, in chat
return await self.handle_batching_request(
^^^^^^^^^^^^^^^^^
File "/app/miniconda/envs/inference/lib/python3.11/site-packages/xinference/core/model.py", line 467, in handle_batching_request
result = await fut
^^^^^^^^^^^^^^^^^
ValueError: [address=0.0.0.0:38867, pid=1165578] <xinference.core.scheduler.InferenceRequest object at 0x748422df09d0>
2024-07-11 16:12:41,403 xinference.model.llm.pytorch.utils 1165578 ERROR Internal error for batch inference: <xinference.core.scheduler.InferenceRequest object at 0x748422df0e90>.
Traceback (most recent call last):
File "/app/miniconda/envs/inference/lib/python3.11/site-packages/xinference/model/llm/pytorch/utils.py", line 770, in batch_inference_one_step
_batch_inference_one_step_internal(
File "/app/miniconda/envs/inference/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/app/miniconda/envs/inference/lib/python3.11/site-packages/xinference/model/llm/pytorch/utils.py", line 733, in _batch_inference_one_step_internal
invalid_token_num = decode_round - stop_token_mapping[r]
~~~~~~~~~~~~~~~~~~^^^
KeyError: <xinference.core.scheduler.InferenceRequest object at 0x748422df0e90>
2024-07-11 16:12:41,409 xinference.api.restful_api 172742 ERROR [address=0.0.0.0:38867, pid=1165578] <xinference.core.scheduler.InferenceRequest object at 0x748422df0e90>
Traceback (most recent call last):
File "/app/miniconda/envs/inference/lib/python3.11/site-packages/xinference/api/restful_api.py", line 1566, in create_chat_completion
data = await model.chat(
^^^^^^^^^^^^^^^^^
File "/app/miniconda/envs/inference/lib/python3.11/site-packages/xoscar/backends/context.py", line 227, in send
return self._process_result_message(result)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/miniconda/envs/inference/lib/python3.11/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
raise message.as_instanceof_cause()
File "/app/miniconda/envs/inference/lib/python3.11/site-packages/xoscar/backends/pool.py", line 659, in send
result = await self._run_coro(message.message_id, coro)
^^^^^^^^^^^^^^^^^
File "/app/miniconda/envs/inference/lib/python3.11/site-packages/xoscar/backends/pool.py", line 370, in _run_coro
return await coro
File "/app/miniconda/envs/inference/lib/python3.11/site-packages/xoscar/api.py", line 384, in on_receive
return await super().on_receive(message) # type: ignore
^^^^^^^^^^^^^^^^^
File "xoscar/core.pyx", line 558, in on_receive
raise ex
File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive
async with self._lock:
^^^^^^^^^^^^^^^^^
File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive
with debug_async_timeout('actor_lock_timeout',
^^^^^^^^^^^^^^^^^
File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.on_receive
result = await result
^^^^^^^^^^^^^^^^^
File "/app/miniconda/envs/inference/lib/python3.11/site-packages/xinference/core/utils.py", line 45, in wrapped
ret = await func(*args, **kwargs)
^^^^^^^^^^^^^^^^^
File "/app/miniconda/envs/inference/lib/python3.11/site-packages/xinference/core/model.py", line 87, in wrapped_func
ret = await fn(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^
File "/app/miniconda/envs/inference/lib/python3.11/site-packages/xoscar/api.py", line 462, in _wrapper
r = await func(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^
File "/app/miniconda/envs/inference/lib/python3.11/site-packages/xinference/core/model.py", line 482, in chat
return await self.handle_batching_request(
^^^^^^^^^^^^^^^^^
File "/app/miniconda/envs/inference/lib/python3.11/site-packages/xinference/core/model.py", line 467, in handle_batching_request
result = await fut
^^^^^^^^^^^^^^^^^
ValueError: [address=0.0.0.0:38867, pid=1165578] <xinference.core.scheduler.InferenceRequest object at 0x748422df0e90>
2024-07-11 16:12:43,893 xinference.model.llm.pytorch.utils 1165578 ERROR Internal error for batch inference: <xinference.core.scheduler.InferenceRequest object at 0x748422300cd0>.
Traceback (most recent call last):
File "/app/miniconda/envs/inference/lib/python3.11/site-packages/xinference/model/llm/pytorch/utils.py", line 770, in batch_inference_one_step
_batch_inference_one_step_internal(
File "/app/miniconda/envs/inference/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/app/miniconda/envs/inference/lib/python3.11/site-packages/xinference/model/llm/pytorch/utils.py", line 733, in _batch_inference_one_step_internal
invalid_token_num = decode_round - stop_token_mapping[r]
~~~~~~~~~~~~~~~~~~^^^
KeyError: <xinference.core.scheduler.InferenceRequest object at 0x748422300cd0>
2024-07-11 16:12:43,898 xinference.api.restful_api 172742 ERROR [address=0.0.0.0:38867, pid=1165578] <xinference.core.scheduler.InferenceRequest object at 0x748422300cd0>
Traceback (most recent call last):
File "/app/miniconda/envs/inference/lib/python3.11/site-packages/xinference/api/restful_api.py", line 1566, in create_chat_completion
data = await model.chat(
^^^^^^^^^^^^^^^^^
File "/app/miniconda/envs/inference/lib/python3.11/site-packages/xoscar/backends/context.py", line 227, in send
return self._process_result_message(result)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/miniconda/envs/inference/lib/python3.11/site-packages/xoscar/backends/context.py", line 102, in _process_result_message
raise message.as_instanceof_cause()
File "/app/miniconda/envs/inference/lib/python3.11/site-packages/xoscar/backends/pool.py", line 659, in send
result = await self._run_coro(message.message_id, coro)
^^^^^^^^^^^^^^^^^
File "/app/miniconda/envs/inference/lib/python3.11/site-packages/xoscar/backends/pool.py", line 370, in _run_coro
return await coro
File "/app/miniconda/envs/inference/lib/python3.11/site-packages/xoscar/api.py", line 384, in on_receive
return await super().on_receive(message) # type: ignore
^^^^^^^^^^^^^^^^^
File "xoscar/core.pyx", line 558, in on_receive
raise ex
File "xoscar/core.pyx", line 520, in xoscar.core._BaseActor.on_receive
async with self._lock:
^^^^^^^^^^^^^^^^^
File "xoscar/core.pyx", line 521, in xoscar.core._BaseActor.on_receive
with debug_async_timeout('actor_lock_timeout',
^^^^^^^^^^^^^^^^^
File "xoscar/core.pyx", line 526, in xoscar.core._BaseActor.on_receive
result = await result
^^^^^^^^^^^^^^^^^
File "/app/miniconda/envs/inference/lib/python3.11/site-packages/xinference/core/utils.py", line 45, in wrapped
ret = await func(*args, **kwargs)
^^^^^^^^^^^^^^^^^
File "/app/miniconda/envs/inference/lib/python3.11/site-packages/xinference/core/model.py", line 87, in wrapped_func
ret = await fn(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^
File "/app/miniconda/envs/inference/lib/python3.11/site-packages/xoscar/api.py", line 462, in _wrapper
r = await func(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^
File "/app/miniconda/envs/inference/lib/python3.11/site-packages/xinference/core/model.py", line 482, in chat
return await self.handle_batching_request(
^^^^^^^^^^^^^^^^^
File "/app/miniconda/envs/inference/lib/python3.11/site-packages/xinference/core/model.py", line 467, in handle_batching_request
result = await fut
^^^^^^^^^^^^^^^^^
ValueError: [address=0.0.0.0:38867, pid=1165578] <xinference.core.scheduler.InferenceRequest object at 0x748422300cd0>``

代码调用逻辑(这个调用测试方法应该没问题)
微信截图_20240711162936

Expected behavior / 期待表现

代码调用应该正常

@XprobeBot XprobeBot added this to the v0.13.1 milestone Jul 11, 2024
@ChengjieLi28
Copy link
Contributor

@xxch XINFERENCE_TRANSFORMERS_ENABLE_BATCHING=1 这个选项0.13.0才正式可用,0.12.3还处于开发阶段,不做保证。

@xxch
Copy link
Author

xxch commented Jul 11, 2024

我升级到了0.13.0版本,并去掉了XINFERENCE_TRANSFORMERS_ENABLE_BATCHING=1参数就好了。0.13.0版本添加XINFERENCE_TRANSFORMERS_ENABLE_BATCHING=1 参数还是不行

@xxch xxch closed this as completed Jul 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants