You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jan 24, 2024. It is now read-only.
When I call multiple streaming completions at the same time I get the error below.
start listening on 127.0.0.1:8888
ERROR:waitress:Exception while serving /v1/completions
Traceback (most recent call last):
File "/home/chang/anaconda3/envs/hf38/lib/python3.8/site-packages/waitress/channel.py", line 428, in service
task.service()
File "/home/chang/anaconda3/envs/hf38/lib/python3.8/site-packages/waitress/task.py", line 168, in service
self.execute()
File "/home/chang/anaconda3/envs/hf38/lib/python3.8/site-packages/waitress/task.py", line 456, in execute
for chunk in app_iter:
File "/home/chang/anaconda3/envs/hf38/lib/python3.8/site-packages/werkzeug/wsgi.py", line 500, in __next__
return self._next()
File "/home/chang/anaconda3/envs/hf38/lib/python3.8/site-packages/werkzeug/wrappers/response.py", line 50, in _iter_encoded
for item in iterable:
File "/home/chang/AI/llm/basaran/basaran/__main__.py", line 168, in stream
for choice in stream_model(**options):
File "/home/chang/AI/llm/basaran/basaran/model.py", line 73, in __call__
for (
File "/home/chang/AI/llm/basaran/basaran/model.py", line 237, in generate
outputs = self.model(
File "/home/chang/anaconda3/envs/hf38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/chang/anaconda3/envs/hf38/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/chang/anaconda3/envs/hf38/lib/python3.8/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 662, in forward
outputs = self.gpt_neox(
File "/home/chang/anaconda3/envs/hf38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/chang/anaconda3/envs/hf38/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/chang/anaconda3/envs/hf38/lib/python3.8/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 553, in forward
outputs = layer(
File "/home/chang/anaconda3/envs/hf38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/chang/anaconda3/envs/hf38/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/chang/anaconda3/envs/hf38/lib/python3.8/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 335, in forward
mlp_output = self.mlp(self.post_attention_layernorm(hidden_states))
File "/home/chang/anaconda3/envs/hf38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/chang/anaconda3/envs/hf38/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/chang/anaconda3/envs/hf38/lib/python3.8/site-packages/transformers/models/gpt_neox/modeling_gpt_neox.py", line 297, in forward
hidden_states = self.dense_4h_to_h(hidden_states)
File "/home/chang/anaconda3/envs/hf38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/chang/anaconda3/envs/hf38/lib/python3.8/site-packages/accelerate/hooks.py", line 165, in new_forward
output = old_forward(*args, **kwargs)
File "/home/chang/anaconda3/envs/hf38/lib/python3.8/site-packages/bitsandbytes/nn/modules.py", line 320, in forward
out = bnb.matmul(x, self.weight, bias=self.bias, state=self.state)
File "/home/chang/anaconda3/envs/hf38/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py", line 500, in matmul
return MatMul8bitLt.apply(A, B, out, bias, state)
File "/home/chang/anaconda3/envs/hf38/lib/python3.8/site-packages/torch/autograd/function.py", line 506, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/home/chang/anaconda3/envs/hf38/lib/python3.8/site-packages/bitsandbytes/autograd/_functions.py", line 417, in forward
output += torch.matmul(subA, state.subB)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (238x13 and 29x5120)
ERROR:waitress:Exception while serving /v1/completions
The text was updated successfully, but these errors were encountered:
When I call multiple streaming completions at the same time I get the error below.
The text was updated successfully, but these errors were encountered: