Skip to content

feat(connector): support multi-PegaServer for cross-node TP#118

Draft
xiaguan wants to merge 1 commit intomasterfrom
feat/multi-server-connector
Draft

feat(connector): support multi-PegaServer for cross-node TP#118
xiaguan wants to merge 1 commit intomasterfrom
feat/multi-server-connector

Conversation

@xiaguan
Copy link
Collaborator

@xiaguan xiaguan commented Feb 28, 2026

Summary

  • Support vLLM TP16 across 2 nodes where each node runs its own PegaServer (each managing 8 GPUs)
  • Add PEGAFLOW_ENDPOINTS env var / pegaflow.endpoints extra_config for multi-server endpoint resolution
  • Connector maps global ranks to local (server-scoped) tp_rank/tp_size/world_size — each PegaServer sees itself as TP8 transparently
  • Scheduler queries all servers, takes min(hit_blocks), unpins excess blocks on over-hitting servers
  • Single-server (num_servers=1) behavior is fully backward compatible, zero overhead

Changed files

File Change
python/pegaflow/connector/common.py Add engine_clients field to ConnectorContext
python/pegaflow/connector/__init__.py Multi-endpoint resolution + local rank mapping
python/pegaflow/connector/scheduler.py Multi-server query with min-reduce + unpin excess

Design

  • No Rust changes, no vLLM changes — purely Python connector layer
  • Namespace uses global tp_size for consistent hashing across all ranks
  • Worker auto-connects to its local server: server_index = global_rank // local_tp_size
  • Scheduler coordinates queries across all servers with proper pin cleanup in all error paths

Test plan

  • Single-server config: verify identical behavior (regression)
  • TP16 / 2-server: verify correct rank mapping (rank 0-7 → server 0, rank 8-15 → server 1)
  • Multi-server cache hit alignment: verify min-reduce and excess unpin
  • Error paths: verify no pin leaks on service error / business error / loading state

🤖 Generated with Claude Code

When vLLM runs TP16 across 2 nodes, each node has its own PegaServer
managing 8 GPUs. The connector now resolves multiple endpoints and maps
global ranks to local (server-scoped) tp_rank/tp_size/world_size so
each PegaServer sees itself as TP8 transparently.

- Add PEGAFLOW_ENDPOINTS env / pegaflow.endpoints config for multi-server
- Compute local rank mapping: server_index = global_rank // local_tp_size
- Scheduler queries all servers, takes min(hit_blocks), unpins excess
- Worker auto-connects to its local server via rank mapping
- Single-server (num_servers=1) behavior is fully backward compatible

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@xiaguan xiaguan marked this pull request as draft March 2, 2026 06:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant