[serve] Static Placement Group RFC Implementation #59912

nrghosh · 2026-01-07T03:23:15Z

Summary

Implements the Static Placement Group RFC (#59857) to enable external placement groups with explicit replica-to-bundle mapping for Ray Serve deployments.

Key features:

New StaticPlacementConfig dataclass for external placement group configuration
_placement_info parameter on @serve.deployment decorator
bundle_indices exposed via serve.get_replica_context()
Recovery support: replicas restart on identical bundles

Use case: GPU colocation between Serve deployments and other Ray components (e.g., RL training workflows requiring zero-copy weight sync via CUDA IPC).

Example Usage

from ray.util.placement_group import placement_group
from ray import serve
from ray.serve.config import StaticPlacementConfig

# Create external placement group
pg = placement_group([{"GPU": 1, "CPU": 1}] * 4)
ray.get(pg.ready())

@serve.deployment(
    _placement_info=StaticPlacementConfig(
        placement_group=pg,
        replica_bundle_mapping={
            0: [0, 1],  # Replica 0 uses bundles 0 and 1
            1: [2, 3],  # Replica 1 uses bundles 2 and 3
        },
    ),
)
class MyLLMServer:
    def __init__(self):
        ctx = serve.get_replica_context()
        print(f"Replica {ctx.rank} using bundles: {ctx.bundle_indices}")

Test plan

Unit tests for StaticPlacementConfig validation
Integration tests with actual placement groups
Recovery test: controller restart with live replicas
Verify mutual exclusivity validation with autoscaling

🤖 Generated with Claude Code

gemini-code-assist

Code Review

This pull request introduces a significant new feature: static placement groups for Ray Serve. The implementation is comprehensive, touching the necessary components from the public API down to the deployment scheduler and state management. The new StaticPlacementConfig dataclass is well-designed with robust validation. The logic for handling static placement during scheduling and controller recovery is also well-thought-out.

I've identified one critical issue concerning rank management that could lead to a resource leak, and I've provided a detailed suggestion for a fix. I also pointed out a minor redundancy for code cleanup. Overall, this is a solid contribution that adds valuable functionality to Ray Serve. Addressing the critical issue is essential before merging.

gemini-code-assist · 2026-01-07T03:25:57Z

python/ray/serve/_private/deployment_state.py

+            # For static placement, node_id may be None at rank assignment time
+            # since the node is determined by the placement group bundle.
+            # In this case, we skip local rank assignment and use placeholder values.
+            if node_id is None:
+                # Static placement: node_rank and local_rank are not meaningful
+                # since placement is determined by bundle indices, not node affinity
+                return ReplicaRank(rank=rank, node_rank=-1, local_rank=-1)
+
+            # Track the replica-to-node mapping
+            self._replica_to_node[replica_id] = node_id


There's a potential bug here for static placement replicas. When node_id is None, self._replica_to_node is not populated for the replica_id. This will cause self.has_replica_rank(replica_id) to return False later, because it checks for replica_id in self._replica_to_node.

As a result, when a static placement replica is stopped, self._rank_manager.release_rank(replica_id) will not be called, leading to a rank leak.

To fix this, self._replica_to_node[replica_id] = node_id should be set even when node_id is None. This will require follow-up changes in has_replica_rank, release_rank, and recover_rank to correctly handle cases where node_id is None from self._replica_to_node.

I've suggested a change for this block below. You'll also need to update release_rank to handle node_id being None.

Suggested change

# For static placement, node_id may be None at rank assignment time

# since the node is determined by the placement group bundle.

# In this case, we skip local rank assignment and use placeholder values.

if node_id is None:

# Static placement: node_rank and local_rank are not meaningful

# since placement is determined by bundle indices, not node affinity

return ReplicaRank(rank=rank, node_rank=-1, local_rank=-1)

# Track the replica-to-node mapping

self._replica_to_node[replica_id] = node_id

# Track the replica-to-node mapping. For static placement, node_id will be

# None initially.

self._replica_to_node[replica_id] = node_id

# For static placement, node_id may be None at rank assignment time

# since the node is determined by the placement group bundle.

# In this case, we skip local rank assignment and use placeholder values.

if node_id is None:

# Static placement: node_rank and local_rank are not meaningful

# since placement is determined by bundle indices, not node affinity

return ReplicaRank(rank=rank, node_rank=-1, local_rank=-1)

gemini-code-assist · 2026-01-07T03:25:57Z

python/ray/serve/_private/deployment_state.py

+        if static_placement_config is None:
+            return


This None check is redundant. The type hint for static_placement_config is non-optional, and the only caller in _check_startup_replicas already ensures it's not None before calling this method. You can remove these lines for cleaner code.

Implements the Static Placement Group RFC (ray-project#59857) to enable external placement groups with explicit replica-to-bundle mapping for Ray Serve deployments. Key changes: - Add StaticPlacementConfig dataclass in config.py - Add _placement_info parameter to deployment decorator - Update scheduler for static placement groups - Add bundle_indices to ReplicaContext - Implement recovery for static placement - Add unit tests for StaticPlacementConfig

nrghosh force-pushed the nrghosh/static-placement-group-rfc branch from c3d473f to c17eb30 Compare January 7, 2026 03:25

gemini-code-assist bot reviewed Jan 7, 2026

View reviewed changes

nrghosh force-pushed the nrghosh/static-placement-group-rfc branch from c17eb30 to f9a8b1f Compare January 7, 2026 03:26

nrghosh force-pushed the nrghosh/static-placement-group-rfc branch from f9a8b1f to 7b6d922 Compare January 8, 2026 21:52

nrghosh closed this Jan 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[serve] Static Placement Group RFC Implementation #59912

[serve] Static Placement Group RFC Implementation #59912

Uh oh!

nrghosh commented Jan 7, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 7, 2026

Uh oh!

gemini-code-assist bot Jan 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[serve] Static Placement Group RFC Implementation #59912

[serve] Static Placement Group RFC Implementation #59912

Uh oh!

Conversation

nrghosh commented Jan 7, 2026

Summary

Example Usage

Test plan

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant