feat: Add Expert Affinity Aware EPLB algorithm. #2

shangyuan-ant · 2025-09-19T07:57:56Z

Motivation

The natively implemented EPLB algorithm primarily focuses on balancing the computational load across each GPU and machine but does not adequately account for inter-expert communication (such as cross-node communication). In large-scale expert parallelism scenarios, excessive cross-node communication is more likely to compromise computational efficiency.

Modifications

Building upon expert load tracking, we further record the top-k expert groups activated in each iteration to compute an expert affinity matrix (i.e., the probability of co-activation). After intra-card load balancing via EPLB, we adjust card placement based on the affinity between the expert with the highest load in one gpu and other experts within other gpus, thereby reducing subsequent cross-node communication. This approach can achieve an additional ~5% performance improvement over standard EPLB.

Accuracy Tests

Benchmarking and Profiling

■ request-rate = 5 | max-concurrency(batch-size) = (512 896 1024 1536 2048)
■ num-prompts = 4096 | input-len = 4096 | output-len = 1536
■ dataset: ShareGPT_V3_unfiltered_cleaned_split.json

batch-size	Performance	W/o EPLB	With EPLB(vanilla)	With EPLB(Expert-Affinity Aware)
64	P50-TTFT	566.78	540.49	559.74
	P90-TPOT	45.02	44.95	44.94
	QPS	1.35	1.36	1.36
128	P50-TTFT	539.93	537.22	541.04
	P90-TPOT	49.18	49.10	49.10
	QPS	2.36	2.36	2.36
256	P50-TTFT	764.42	754.62	758.67
	P90-TPOT	56.32	56.18	56.06
	QPS	3.37	3.37	3.37
1536	P50-TTFT	1464.77	1463.27	1485.56
	P90-TPOT	85.12	84.31	81.38
	P95-ITL	102.60	100.71	97.22
	QPS	4.48	4.48	4.49
2048	P50-TTFT	1470.45	1463.95	1480.91
	P90-TPOT	85.15	84.60	81.39
	P95-ITL	102.87	100.21	97.08
	QPS	4.48	4.48	4.49

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.

Signed-off-by: shangyuan-ant <cx483263@antgroup.com>

yuan-luo · 2025-09-27T09:41:03Z

Could you paste the performance gain result?

feat: Add Expert Affinity Aware EPLB algorithm.

1d5f106

Signed-off-by: shangyuan-ant <cx483263@antgroup.com>

This was referenced Oct 20, 2025

[Don't merge] Deploying DeepSeek-R1 on H20-96G with SGLang: Best Practices sgl-project/sglang#11854

Closed

[Don't merge] Deploying DeepSeek-R1 on H20-96G with SGLang: Best Practices #4

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add Expert Affinity Aware EPLB algorithm. #2

feat: Add Expert Affinity Aware EPLB algorithm. #2

Uh oh!

shangyuan-ant commented Sep 19, 2025 •

edited

Loading

Uh oh!

yuan-luo commented Sep 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: Add Expert Affinity Aware EPLB algorithm. #2

Are you sure you want to change the base?

feat: Add Expert Affinity Aware EPLB algorithm. #2

Uh oh!

Conversation

shangyuan-ant commented Sep 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

yuan-luo commented Sep 27, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

shangyuan-ant commented Sep 19, 2025 •

edited

Loading