Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reasoning parser #3859

Open
wants to merge 18 commits into
base: main
Choose a base branch
from
Open

Conversation

ShaoZhang0115
Copy link

Motivation

Rewrite #3202

Modifications

  1. add --enable-reasoning and --reasoning-parser options for deepseek r1 series models.
  2. return reasoning_content as in official api, ref: https://api-docs.deepseek.com/zh-cn/guides/reasoning_model, in both streaming and non-streaming chat completions.
    Example:
python -m sglang.launch_server --host 0.0.0.0 \
--model-path deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B \
--tp 1 --enable-reasoning --reasoning-parser deepseek-r1 
curl --location --request POST 'http: //localhost:30000/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer YOUR_API_KEY' \
--data '{
    "model": "default",
    "messages": [
        {
            "role": "user",
            "content": "Calculate 1 + 3"
        }
    ],
    "stream": false
}'

Get response:

{
    "id": "53de20f7f1244195826e7b52011c37a4",
    "object": "chat.completion",
    "created": 1740507802,
    "model": "default",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "\n\n**Solution:**\n\nTo calculate \\(1 + 3\\), follow these easy steps:\n\n1. **Identify the numbers to add:**  \n   You have the number **1** and the number **3**.\n\n2. **Add the numbers together:**  \n   \\[\n   1 + 3 = 4\n   \\]\n\n3. **Final Answer:**  \n   \\[\n   \\boxed{4}\n   \\]",
                "reasoning_content": "To calculate the sum of 1 and 3, I will begin by identifying the two numbers involved in the addition. The first number is 1, and the second number is 3.\n\nNext, I will add these two numbers together. Adding 1 and 3 gives me a total of 4.\n\nTherefore, the result of 1 plus 3 is 4.\n",
                "tool_calls": null
            },
            "logprobs": null,
            "finish_reason": "stop",
            "matched_stop": 151643
        }
    ],
    "usage": {
        "prompt_tokens": 11,
        "total_tokens": 179,
        "completion_tokens": 168,
        "prompt_tokens_details": null
    }
}

Docs with be updated as soon as possible.

Checklist

Comment on lines 32 to 33
self.think_start_token = "<think>"
self.think_end_token = "</think>"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we extend this to all reasoning models? Not just dpsk R1. There might be different thinking tokens.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think different reasoning models need different parers, and I add docs for it.

@xihuai18
Copy link

  • Add Docs
  • Test with streaming and non-streaming cases, with truncated or non-truncated max-tokens for reasoning.

@xihuai18
Copy link

However, I can not pass my tests with --enable-torch-compile, which is confusing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants