-
Notifications
You must be signed in to change notification settings - Fork 4
Description
Name and Version
$ ./llama-server --version
ggml_cuda_init: found 8 CUDA devices:
Device 0: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
Device 1: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
Device 2: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
Device 3: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
Device 4: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
Device 5: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
Device 6: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
Device 7: NVIDIA GeForce RTX 3090, compute capability 8.6, VMM: yes
version: 7743 (6a023e8)
built with GNU 13.3.0 for Linux x86_64
Operating systems
Linux
GGML backends
CUDA
Hardware
8 x 3090
Models
GLM-4.7-UD-Q3_K_XL
Problem description & steps to reproduce
Tool parsing is often broken: {"error":{"code":500,"message":"Failed to parse input at pos 184: ... }} for GLM-4.7
Curl command reproduction:
curl 'http://127.0.0.1:8082/v1/chat/completions' \
--data-raw '{"stream":true,"return_progress":true,"temperature":0.7,"min_p":0,"cache_prompt":true,"model":"GLM-4.7-UD-Q3_K_XL","top_k":40,"top_p":1,"messages":[{"role":"user","content":"list files in current directory"}],"tools":[{"type":"function","function":{"name":"bash","description":"Execute a bash command and return the output","parameters":{"type":"object","properties":{"explanation":{"type":"string","description":"One sentence explanation as to why this tool is being used, and how it contributes to the goal."},"command":{"type":"string","description":"A bash command to execute"}},"required":["command","explanation"]}}}]}'
Output:
// [...]
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":" requested"}}]}}],"created":1767932874,"id":"chatcmpl-WSkobex5fe5dO1IJtQqrZlpzEcJZD3Xm","model":"GLM-4.7-UD-Q3_K_XL-00001-of-00004.gguf","system_fingerprint":"b7743-6a023e898","object":"chat.completion.chunk"}
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":" by"}}]}}],"created":1767932874,"id":"chatcmpl-WSkobex5fe5dO1IJtQqrZlpzEcJZD3Xm","model":"GLM-4.7-UD-Q3_K_XL-00001-of-00004.gguf","system_fingerprint":"b7743-6a023e898","object":"chat.completion.chunk"}
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":" the"}}]}}],"created":1767932874,"id":"chatcmpl-WSkobex5fe5dO1IJtQqrZlpzEcJZD3Xm","model":"GLM-4.7-UD-Q3_K_XL-00001-of-00004.gguf","system_fingerprint":"b7743-6a023e898","object":"chat.completion.chunk"}
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":" user"}}]}}],"created":1767932874,"id":"chatcmpl-WSkobex5fe5dO1IJtQqrZlpzEcJZD3Xm","model":"GLM-4.7-UD-Q3_K_XL-00001-of-00004.gguf","system_fingerprint":"b7743-6a023e898","object":"chat.completion.chunk"}
data: {"choices":[{"finish_reason":null,"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"\"}"}}]}}],"created":1767932874,"id":"chatcmpl-WSkobex5fe5dO1IJtQqrZlpzEcJZD3Xm","model":"GLM-4.7-UD-Q3_K_XL-00001-of-00004.gguf","system_fingerprint":"b7743-6a023e898","object":"chat.completion.chunk"}
data: {"error":{"code":500,"message":"Failed to parse input at pos 184: <tool_call>bash</tool_call>{\"command\": \"ls\", \"explanation\": \"List files in the current directory as requested by the user\"} >","type":"server_error"}}
llama.cpp server command:
CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" ./llama-server -m /modele/GLM-4.7-UD-Q3_K_XL-00001-of-00004.gguf -ngl 9999 --threads 4 -c 110000 -fa on --host localhost --port 8082 --jinja --verbose -ub 128 -b 128 -ctk q8_0 -ctv q8_0
and server output log tail:
// [...]
que start_loop: processing new tasks
que start_loop: processing task, id = 1438
que start_loop: update slots
srv update_slots: all slots are idle
que start_loop: waiting for new tasks
srv update_chat_: Parsing chat message: The user wants to list files in the current directory. This is a straightforward request that can be done with the `ls` command. I'll use the bash tool to execute this command.</think><tool_call>bash</tool_call>{"command": "ls", "explanation": "List files in the current directory as requested by the user"} >
Parsing PEG input with format peg-native: The user wants to list files in the current directory. This is a straightforward request that can be done with the `ls` command. I'll use the bash tool to execute this command.</think><tool_call>bash</tool_call>{"command": "ls", "explanation": "List files in the current directory as requested by the user"} >
srv update_chat_: Parsing chat message: The user wants to list files in the current directory. This is a straightforward request that can be done with the `ls` command. I'll use the bash tool to execute this command.</think><tool_call>bash</tool_call>{"command": "ls", "explanation": "List files in the current directory as requested by the user"} >
Parsing PEG input with format peg-native: The user wants to list files in the current directory. This is a straightforward request that can be done with the `ls` command. I'll use the bash tool to execute this command.</think><tool_call>bash</tool_call>{"command": "ls", "explanation": "List files in the current directory as requested by the user"} >
srv operator(): http: streamed chunk: data: {"error":{"code":500,"message":"Failed to parse input at pos 184: <tool_call>bash</tool_call>{\"command\": \"ls\", \"explanation\": \"List files in the current directory as requested by the user\"} >","type":"server_error"}}
First Bad Commit
No response