feat(connector): support multi-PegaServer for cross-node TP by xiaguan · Pull Request #118 · novitalabs/pegaflow

xiaguan · 2026-02-28T12:03:53Z

Summary

Support vLLM TP16 across 2 nodes where each node runs its own PegaServer (each managing 8 GPUs)
Add PEGAFLOW_ENDPOINTS env var / pegaflow.endpoints extra_config for multi-server endpoint resolution
Connector maps global ranks to local (server-scoped) tp_rank/tp_size/world_size — each PegaServer sees itself as TP8 transparently
Scheduler queries all servers, takes min(hit_blocks), unpins excess blocks on over-hitting servers
Single-server (num_servers=1) behavior is fully backward compatible, zero overhead

Changed files

File	Change
`python/pegaflow/connector/common.py`	Add `engine_clients` field to `ConnectorContext`
`python/pegaflow/connector/__init__.py`	Multi-endpoint resolution + local rank mapping
`python/pegaflow/connector/scheduler.py`	Multi-server query with min-reduce + unpin excess

Design

No Rust changes, no vLLM changes — purely Python connector layer
Namespace uses global tp_size for consistent hashing across all ranks
Worker auto-connects to its local server: server_index = global_rank // local_tp_size
Scheduler coordinates queries across all servers with proper pin cleanup in all error paths

Test plan

Single-server config: verify identical behavior (regression)
TP16 / 2-server: verify correct rank mapping (rank 0-7 → server 0, rank 8-15 → server 1)
Multi-server cache hit alignment: verify min-reduce and excess unpin
Error paths: verify no pin leaks on service error / business error / loading state

🤖 Generated with Claude Code

When vLLM runs TP16 across 2 nodes, each node has its own PegaServer managing 8 GPUs. The connector now resolves multiple endpoints and maps global ranks to local (server-scoped) tp_rank/tp_size/world_size so each PegaServer sees itself as TP8 transparently. - Add PEGAFLOW_ENDPOINTS env / pegaflow.endpoints config for multi-server - Compute local rank mapping: server_index = global_rank // local_tp_size - Scheduler queries all servers, takes min(hit_blocks), unpins excess - Worker auto-connects to its local server via rank mapping - Single-server (num_servers=1) behavior is fully backward compatible Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

xiaguan marked this pull request as draft March 2, 2026 06:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(connector): support multi-PegaServer for cross-node TP#118

feat(connector): support multi-PegaServer for cross-node TP#118
xiaguan wants to merge 1 commit intomasterfrom
feat/multi-server-connector

xiaguan commented Feb 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

xiaguan commented Feb 28, 2026

Summary

Changed files

Design

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant