Skip to content

Conversation

@juncaipeng
Copy link
Collaborator

@juncaipeng juncaipeng commented Oct 30, 2025

Motivation

  • splitwise deployment on multi node supports router
  • refine pd communication
  • add example for splitwise deployment

Modifications

Usage or Command

Refer to examples.

Accuracy Tests

benchmark of v1 splitwise

benchmark_duration: 16.98276040190831 秒
============ Serving Benchmark Result ============
Successful requests:                     997       
Benchmark duration (s):                  16.98     
Total input tokens:                      1275938   
Total generated tokens:                  1994      
Request throughput (req/s):              58.707    
Output token throughput (tok/s):         117.41    
Total Token throughput (tok/s):          75248.78  
-------------------解码速度(tok/s)--------------------
Mean Decode:                             12.51     
Median Decode:                           12.01     
P80 Decode:                              14.32     
P95 Decode:                              17.76     
P99 Decode:                              29.91     
P99.9 Decode:                            48.12     
P99.95 Decode:                           48.38     
P99.99 Decode:                           48.58     
---------------Time to First Token----------------
Mean TTFT (ms):                          3036.59   
Median TTFT (ms):                        3092.77   
P80 TTFT (ms):                           3325.40   
P95 TTFT (ms):                           3671.44   
P99 TTFT (ms):                           4170.79   
P99.9 TTFT (ms):                         4279.39   
P99.95 TTFT (ms):                        4279.76   
P99.99 TTFT (ms):                        4280.05   
------------Infer Time to First Token-------------
Mean S_TTFT (ms):                        77.57     
Median S_TTFT (ms):                      78.24     
P80 S_TTFT (ms):                         103.77    
P95 S_TTFT (ms):                         128.60    
P99 S_TTFT (ms):                         147.21    
P99.9 S_TTFT (ms):                       158.14    
P99.95 S_TTFT (ms):                      160.26    
P99.99 S_TTFT (ms):                      161.96 

benchmark of v2 splitwise (using router):

benchmark_duration: 16.793025318998843 秒
============ Serving Benchmark Result ============
Successful requests:                     997       
Benchmark duration (s):                  16.79     
Total input tokens:                      1275938   
Total generated tokens:                  1994      
Request throughput (req/s):              59.370    
Output token throughput (tok/s):         118.74    
Total Token throughput (tok/s):          76098.97  
-------------------解码速度(tok/s)--------------------
Mean Decode:                             13.51     
Median Decode:                           12.60     
P80 Decode:                              14.95     
P95 Decode:                              20.80     
P99 Decode:                              43.47     
P99.9 Decode:                            54.75     
P99.95 Decode:                           69.16     
P99.99 Decode:                           80.69     
---------------Time to First Token----------------
Mean TTFT (ms):                          3026.10   
Median TTFT (ms):                        3079.82   
P80 TTFT (ms):                           3209.71   
P95 TTFT (ms):                           4362.91   
P99 TTFT (ms):                           4838.31   
P99.9 TTFT (ms):                         4930.10   
P99.95 TTFT (ms):                        4931.17   
P99.99 TTFT (ms):                        4932.04   
------------Infer Time to First Token-------------
Mean S_TTFT (ms):                        72.28     
Median S_TTFT (ms):                      70.63     
P80 S_TTFT (ms):                         93.56     
P95 S_TTFT (ms):                         119.08    
P99 S_TTFT (ms):                         147.24    
P99.9 S_TTFT (ms):                       273.10    
P99.95 S_TTFT (ms):                      277.48    
P99.99 S_TTFT (ms):                      280.98 

Checklist

  • Add at least a tag in the PR title.
    • Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
    • You can add new tags based on the PR content, but the semantics must be clear.
  • Format your code, run pre-commit before commit.
  • Add unit tests. Please write the reason in this PR if no unit tests.
  • Provide accuracy results.
  • If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


root seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@paddle-bot
Copy link

paddle-bot bot commented Oct 30, 2025

Thanks for your contribution!


export CUDA_VISIBLE_DEVICES=0
export FD_DEBUG=1
export ENABLE_V1_KVCACHE_SCHEDULER=0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个开关应该已经废弃了,另外这个还开了DEBUG日志

self.ips = self.ips.split(",")

self.host_ip = get_host_ip()
self.port = port
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里是把服务层的端口号下放到config了吗,这个应该只是APIServer层的配置参数

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants