Skip to content

Conversation

@damodaryekkuluri
Copy link

Description

This documentation change helps users implement "Logic-level" monitoring instead of just "Process-level" monitoring.

Context/Background

Users deploying heavy-compute models (specifically vLLM for LLMs) often encounter "zombie replicas." This occurs when the application's internal engine (e.g., the vLLM background loop) crashes with an AsyncEngineDeadError, but the Ray Actor process remains alive. Because the process is alive, the default Ray Serve health check remains green, and the Ray Head node continues to route traffic to the broken replica, resulting in a series of 5xx errors for the end user.

Currently, the documentation for implementing a custom check_health hook to solve this is fragmented and not easily discoverable in the core monitoring or API guides.

Changes

This PR improves the discoverability of application-level health checks in two ways:

  1. doc/source/serve/monitoring.md: Added a new section, "Application-level Health Checks (Custom Health Checks)," including a clear code example and a real-world scenario (engine crashes vs. process health).
  2. python/ray/serve/deployment.py: Updated the Deployment class docstring to explicitly list check_health() as an optional user-defined hook. This ensures that developers using IDEs can discover the feature via hover-over or code completion.

Signed-off-by: Damodar Yekkuluri <damodar3sachin@gmail.com>
Added documentation for user-defined health check method.

Signed-off-by: Damodar Yekkuluri <damodar3sachin@gmail.com>
@damodaryekkuluri damodaryekkuluri requested review from a team as code owners January 7, 2026 09:30
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds valuable documentation for custom health checks in Ray Serve, addressing a key user need for logic-level monitoring. The changes in monitoring.md and deployment.py significantly improve the discoverability and understanding of the check_health hook. My review focuses on enhancing the clarity and completeness of this new documentation. I've suggested a formatting fix to a list for better logical structure and recommended adding details about configuration options to both the guide and the docstring to make them more comprehensive for users. Overall, this is a great contribution.


* viewing the Ray dashboard
* viewing the `serve status` output
* implementing custom application-level health checks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The new list item for custom health checks is indented, which makes it appear as a sub-item of 'viewing the serve status output'. To improve clarity and discoverability, it should be a top-level item in the list, at the same indentation level as the other items.

Suggested change
* implementing custom application-level health checks
* implementing custom application-level health checks

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentional. the custom health check is a sub-item of the serve status.


### Implementing `check_health`

When you define an `async def check_health(self)` method, Ray Serve calls it periodically (defaulting to every 10 seconds). If this method raises an exception, Ray marks the replica as `UNHEALTHY`, stops routing traffic to it, and attempts to restart it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This is a great explanation of check_health. To make it even more helpful for users, I suggest mentioning how they can configure the health check frequency and timeout. You could add a sentence about the health_check_period_s and health_check_timeout_s parameters of the @serve.deployment decorator.

Suggested change
When you define an `async def check_health(self)` method, Ray Serve calls it periodically (defaulting to every 10 seconds). If this method raises an exception, Ray marks the replica as `UNHEALTHY`, stops routing traffic to it, and attempts to restart it.
When you define an `async def check_health(self)` method, Ray Serve calls it periodically. You can configure the period and timeout using the `health_check_period_s` (default: 10s) and `health_check_timeout_s` arguments in the `@serve.deployment` decorator. If this method raises an exception, Ray marks the replica as `UNHEALTHY`, stops routing traffic to it, and attempts to restart it.

Comment on lines 76 to 80
[Optional] Define this method to implement custom application-level
health checks. Ray Serve calls this periodically (default 10s).
If it raises an exception, the replica is marked as UNHEALTHY
and restarted. Useful for detecting internal engine crashes
(like vLLM AsyncEngineDeadError) that leave the actor process alive.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The docstring is helpful for discoverability. To improve it further, I suggest reformatting it for better readability and adding details about how to configure the health check period and timeout, similar to the documentation in monitoring.md.

Suggested change
[Optional] Define this method to implement custom application-level
health checks. Ray Serve calls this periodically (default 10s).
If it raises an exception, the replica is marked as UNHEALTHY
and restarted. Useful for detecting internal engine crashes
(like vLLM AsyncEngineDeadError) that leave the actor process alive.
[Optional] Define this method to implement custom application-level
health checks. Ray Serve calls this periodically. You can configure the
period and timeout using the `health_check_period_s` (default: 10s) and
`health_check_timeout_s` arguments in the `@serve.deployment` decorator.
If it raises an exception, the replica is marked as UNHEALTHY and
restarted. This is useful for detecting internal engine crashes (like
vLLM AsyncEngineDeadError) that leave the actor process alive.

Signed-off-by: Damodar Yekkuluri <damodar3sachin@gmail.com>
@ray-gardener ray-gardener bot added serve Ray Serve Related Issue docs An issue or change related to documentation community-contribution Contributed by the community labels Jan 7, 2026
@harshit-anyscale harshit-anyscale added the go add ONLY when ready to merge, run all tests label Jan 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-contribution Contributed by the community docs An issue or change related to documentation go add ONLY when ready to merge, run all tests serve Ray Serve Related Issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants