[Feature] [PD] splitwise deployment on multi node supports router #4709

juncaipeng · 2025-10-30T12:57:58Z

Motivation

splitwise deployment on multi node supports router
refine pd communication
add example for splitwise deployment

Modifications

Usage or Command

Refer to examples.

Accuracy Tests

benchmark of v1 splitwise

benchmark_duration: 16.98276040190831 秒
============ Serving Benchmark Result ============
Successful requests:                     997       
Benchmark duration (s):                  16.98     
Total input tokens:                      1275938   
Total generated tokens:                  1994      
Request throughput (req/s):              58.707    
Output token throughput (tok/s):         117.41    
Total Token throughput (tok/s):          75248.78  
-------------------解码速度(tok/s)--------------------
Mean Decode:                             12.51     
Median Decode:                           12.01     
P80 Decode:                              14.32     
P95 Decode:                              17.76     
P99 Decode:                              29.91     
P99.9 Decode:                            48.12     
P99.95 Decode:                           48.38     
P99.99 Decode:                           48.58     
---------------Time to First Token----------------
Mean TTFT (ms):                          3036.59   
Median TTFT (ms):                        3092.77   
P80 TTFT (ms):                           3325.40   
P95 TTFT (ms):                           3671.44   
P99 TTFT (ms):                           4170.79   
P99.9 TTFT (ms):                         4279.39   
P99.95 TTFT (ms):                        4279.76   
P99.99 TTFT (ms):                        4280.05   
------------Infer Time to First Token-------------
Mean S_TTFT (ms):                        77.57     
Median S_TTFT (ms):                      78.24     
P80 S_TTFT (ms):                         103.77    
P95 S_TTFT (ms):                         128.60    
P99 S_TTFT (ms):                         147.21    
P99.9 S_TTFT (ms):                       158.14    
P99.95 S_TTFT (ms):                      160.26    
P99.99 S_TTFT (ms):                      161.96

benchmark of v2 splitwise (using router):

benchmark_duration: 16.793025318998843 秒
============ Serving Benchmark Result ============
Successful requests:                     997       
Benchmark duration (s):                  16.79     
Total input tokens:                      1275938   
Total generated tokens:                  1994      
Request throughput (req/s):              59.370    
Output token throughput (tok/s):         118.74    
Total Token throughput (tok/s):          76098.97  
-------------------解码速度(tok/s)--------------------
Mean Decode:                             13.51     
Median Decode:                           12.60     
P80 Decode:                              14.95     
P95 Decode:                              20.80     
P99 Decode:                              43.47     
P99.9 Decode:                            54.75     
P99.95 Decode:                           69.16     
P99.99 Decode:                           80.69     
---------------Time to First Token----------------
Mean TTFT (ms):                          3026.10   
Median TTFT (ms):                        3079.82   
P80 TTFT (ms):                           3209.71   
P95 TTFT (ms):                           4362.91   
P99 TTFT (ms):                           4838.31   
P99.9 TTFT (ms):                         4930.10   
P99.95 TTFT (ms):                        4931.17   
P99.99 TTFT (ms):                        4932.04   
------------Infer Time to First Token-------------
Mean S_TTFT (ms):                        72.28     
Median S_TTFT (ms):                      70.63     
P80 S_TTFT (ms):                         93.56     
P95 S_TTFT (ms):                         119.08    
P99 S_TTFT (ms):                         147.24    
P99.9 S_TTFT (ms):                       273.10    
P99.95 S_TTFT (ms):                      277.48    
P99.99 S_TTFT (ms):                      280.98

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

CLAassistant · 2025-10-30T12:58:05Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.

root seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

paddle-bot · 2025-10-30T12:58:06Z

Thanks for your contribution!

Jiang-Jia-Jun · 2025-10-31T08:46:53Z

examples/splitwise/start_v0.sh

+
+export CUDA_VISIBLE_DEVICES=0
+export FD_DEBUG=1
+export ENABLE_V1_KVCACHE_SCHEDULER=0


这个开关应该已经废弃了，另外这个还开了DEBUG日志

Jiang-Jia-Jun · 2025-10-31T08:48:23Z

fastdeploy/config.py

            self.ips = self.ips.split(",")

        self.host_ip = get_host_ip()
+        self.port = port


这里是把服务层的端口号下放到config了吗，这个应该只是APIServer层的配置参数

root added 2 commits October 30, 2025 12:58

splitwise deployment on multi node supports router

86ad578

refine

22e70a9

juncaipeng force-pushed the pd branch from a2ef22a to 22e70a9 Compare October 30, 2025 12:59

root added 2 commits October 30, 2025 13:10

fix

fd9f594

add test of tp2

b6bbcb3

juncaipeng requested a review from Jiang-Jia-Jun October 31, 2025 02:02

Jiang-Jia-Jun reviewed Oct 31, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] [PD] splitwise deployment on multi node supports router #4709

[Feature] [PD] splitwise deployment on multi node supports router #4709

Uh oh!

juncaipeng commented Oct 30, 2025 •

edited

Loading

Uh oh!

CLAassistant commented Oct 30, 2025

Uh oh!

paddle-bot bot commented Oct 30, 2025

Uh oh!

Jiang-Jia-Jun Oct 31, 2025

Uh oh!

Jiang-Jia-Jun Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Feature] [PD] splitwise deployment on multi node supports router #4709

Are you sure you want to change the base?

[Feature] [PD] splitwise deployment on multi node supports router #4709

Uh oh!

Conversation

juncaipeng commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

CLAassistant commented Oct 30, 2025

Uh oh!

paddle-bot bot commented Oct 30, 2025

Uh oh!

Jiang-Jia-Jun Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

Jiang-Jia-Jun Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

juncaipeng commented Oct 30, 2025 •

edited

Loading