-
Notifications
You must be signed in to change notification settings - Fork 7.1k
[docs][serve] Add guide and docstrings for custom check_health hooks #59924
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[docs][serve] Add guide and docstrings for custom check_health hooks #59924
Conversation
Signed-off-by: Damodar Yekkuluri <damodar3sachin@gmail.com>
Added documentation for user-defined health check method. Signed-off-by: Damodar Yekkuluri <damodar3sachin@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds valuable documentation for custom health checks in Ray Serve, addressing a key user need for logic-level monitoring. The changes in monitoring.md and deployment.py significantly improve the discoverability and understanding of the check_health hook. My review focuses on enhancing the clarity and completeness of this new documentation. I've suggested a formatting fix to a list for better logical structure and recommended adding details about configuration options to both the guide and the docstring to make them more comprehensive for users. Overall, this is a great contribution.
|
|
||
| * viewing the Ray dashboard | ||
| * viewing the `serve status` output | ||
| * implementing custom application-level health checks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new list item for custom health checks is indented, which makes it appear as a sub-item of 'viewing the serve status output'. To improve clarity and discoverability, it should be a top-level item in the list, at the same indentation level as the other items.
| * implementing custom application-level health checks | |
| * implementing custom application-level health checks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intentional. the custom health check is a sub-item of the serve status.
|
|
||
| ### Implementing `check_health` | ||
|
|
||
| When you define an `async def check_health(self)` method, Ray Serve calls it periodically (defaulting to every 10 seconds). If this method raises an exception, Ray marks the replica as `UNHEALTHY`, stops routing traffic to it, and attempts to restart it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great explanation of check_health. To make it even more helpful for users, I suggest mentioning how they can configure the health check frequency and timeout. You could add a sentence about the health_check_period_s and health_check_timeout_s parameters of the @serve.deployment decorator.
| When you define an `async def check_health(self)` method, Ray Serve calls it periodically (defaulting to every 10 seconds). If this method raises an exception, Ray marks the replica as `UNHEALTHY`, stops routing traffic to it, and attempts to restart it. | |
| When you define an `async def check_health(self)` method, Ray Serve calls it periodically. You can configure the period and timeout using the `health_check_period_s` (default: 10s) and `health_check_timeout_s` arguments in the `@serve.deployment` decorator. If this method raises an exception, Ray marks the replica as `UNHEALTHY`, stops routing traffic to it, and attempts to restart it. |
python/ray/serve/deployment.py
Outdated
| [Optional] Define this method to implement custom application-level | ||
| health checks. Ray Serve calls this periodically (default 10s). | ||
| If it raises an exception, the replica is marked as UNHEALTHY | ||
| and restarted. Useful for detecting internal engine crashes | ||
| (like vLLM AsyncEngineDeadError) that leave the actor process alive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docstring is helpful for discoverability. To improve it further, I suggest reformatting it for better readability and adding details about how to configure the health check period and timeout, similar to the documentation in monitoring.md.
| [Optional] Define this method to implement custom application-level | |
| health checks. Ray Serve calls this periodically (default 10s). | |
| If it raises an exception, the replica is marked as UNHEALTHY | |
| and restarted. Useful for detecting internal engine crashes | |
| (like vLLM AsyncEngineDeadError) that leave the actor process alive. | |
| [Optional] Define this method to implement custom application-level | |
| health checks. Ray Serve calls this periodically. You can configure the | |
| period and timeout using the `health_check_period_s` (default: 10s) and | |
| `health_check_timeout_s` arguments in the `@serve.deployment` decorator. | |
| If it raises an exception, the replica is marked as UNHEALTHY and | |
| restarted. This is useful for detecting internal engine crashes (like | |
| vLLM AsyncEngineDeadError) that leave the actor process alive. |
Signed-off-by: Damodar Yekkuluri <damodar3sachin@gmail.com>
Description
This documentation change helps users implement "Logic-level" monitoring instead of just "Process-level" monitoring.
Context/Background
Users deploying heavy-compute models (specifically vLLM for LLMs) often encounter "zombie replicas." This occurs when the application's internal engine (e.g., the vLLM background loop) crashes with an AsyncEngineDeadError, but the Ray Actor process remains alive. Because the process is alive, the default Ray Serve health check remains green, and the Ray Head node continues to route traffic to the broken replica, resulting in a series of 5xx errors for the end user.
Currently, the documentation for implementing a custom check_health hook to solve this is fragmented and not easily discoverable in the core monitoring or API guides.
Changes
This PR improves the discoverability of application-level health checks in two ways:
doc/source/serve/monitoring.md: Added a new section, "Application-level Health Checks (Custom Health Checks)," including a clear code example and a real-world scenario (engine crashes vs. process health).python/ray/serve/deployment.py: Updated the Deployment class docstring to explicitly list check_health() as an optional user-defined hook. This ensures that developers using IDEs can discover the feature via hover-over or code completion.