-
Notifications
You must be signed in to change notification settings - Fork 87
Add is_healthy
to truss
#1283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add is_healthy
to truss
#1283
Conversation
3b008e7
to
cd2d7db
Compare
truss/tests/test_model_inference.py
Outdated
def load(self) -> bool: | ||
time.sleep(10) | ||
|
||
def is_ready(self) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a test with a non-bool return type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe binary/bool is not the right data model for this? It seems we have 3 states: ready
, not ready
and "not defined" - that calls for an enum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think bool
is still valid here. "not defined" defaults to the current health check method, which just checks if the model has successfully loaded, otherwise a boolean is healthy
or is not healthy
seems to be the most intuitive way to reason about a health check
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. I think I got confused because the method of the same name on the model (not the chainlet) can return None if not if hasattr(self, "_chainlet"):
is_ready = await self._model.is_ready() | ||
else: | ||
# Offload sync functions to thread, to not block event loop. | ||
is_ready = await to_thread.run_sync(self._model.is_ready) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we expect any use case where this function is not super fast?
I'm just wondering if the overhead for dual sync/async support is warranted or it would be a nice simplification to just make this "naively sync"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll enforce a 10s timeout on health checks. async support is more for parity with the other truss methods we allow calling async or sync
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking was more in the sense of "can we reduce complexity" (no threading, not multiple code branches, not two ways of defining the function) if there's a reasonable case that this function is super fast - but it seems 10s is not super fast. Still wondering what real situations are where this check would not be virtually instant.
truss/tests/test_model_inference.py
Outdated
def load(self) -> bool: | ||
time.sleep(10) | ||
|
||
def is_ready(self) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe binary/bool is not the right data model for this? It seems we have 3 states: ready
, not ready
and "not defined" - that calls for an enum.
NONE = enum.auto() | ||
INPUTS_ONLY = enum.auto() | ||
REQUEST_ONLY = enum.auto() | ||
INPUTS_AND_REQUEST = enum.auto() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, this helper was originally for the preprocess, predict and postprocess functions, for which also prepare_args
is useful. I'm not sure if this extension is a bit confusing (also for setup_environment
).
I think it makes sense only to use something like MethodDescriptor
for setup_environment
and is_ready
that has no arg_config
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand the comment here, I'm currently defining is_healthy
as a MethodDescriptor
, is there a better way to define is_healthy
? I needed to support no args in ArgConfig
to allow methods to have no argument other than self
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant was that MethodDescriptor
was not envisioned to be used with additional new methods added to the model wrapper - but it's not that important...
0e62bb1
to
9c69f9f
Compare
truss-chains/tests/custom_health_checks/custom_health_checks.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm.
38d865e
to
cb675f9
Compare
url = URL(path=request.url.path, query=request.url.query.encode("utf-8")) | ||
|
||
path = request.url.path | ||
if path == "/v1/models/model": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to ensure we still match on /v1/models/{model_name}
, will update the logic here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
took a closer look at where we call these inference server endpoints and it looks like we hardcode v1/models/model
everywhere else, I think its ok hardcoding it here as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clarifying what the logic here would be:
path = request.url.path
path_params = request.path_params["path"].split("/")
if len(path_params) == 2 and path_params[0] == "models":
# Reroute health checks to the inference server's /v1/models/{model_name}/loaded endpoint
path += "/loaded"
since we technically define {model_name}
as a path parameter in the inference server, we could support this parameter in the control server routing logic i've added in this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
chatted with @squidarth and went with keeping the checks hardcoded to v1/models/model
since we use this everywhere we call the inference server, and moved the string comparison logic to a helper fn: _reroute_if_health_check
|
||
|
||
def _custom_stop_strategy(retry_state: RetryCallState) -> bool: | ||
# Stop after 10 attempts for ModelNotReady |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How long can this end up waiting and retrying?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be capped at 75 seconds from first attempt to the last attempt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could be shorter. 75s seems long, and I'm not sure anything is going to wait that long for a response on the healthcheck. For example, the operator's retry timeout on this is 2s. The wake
call might wait longer though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did some digging and looks like we just call GET /
on wake
, which shouldn't take long. I'll go ahead and update the retries here to be once a second for 10 attempts
# For all other exceptions, stop after INFERENCE_SERVER_START_WAIT_SECS | ||
seconds_since_start = ( | ||
retry_state.seconds_since_start | ||
if retry_state.seconds_since_start is not None | ||
else 0.0 | ||
) | ||
return seconds_since_start >= INFERENCE_SERVER_START_WAIT_SECS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: It shouldn't if I understand the code correctly, but want to double check: does this change the retry behaviour?
Is it possible for retry_state.seconds_since_start
to never be set and for this to retry forever?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think its possible for seconds_since_start
to never be set, looks like its set in the __init__
of RetryCallState
https://github.com/jd/tenacity/blob/main/tenacity/__init__.py#L532
c52ed3e
to
0fa72ee
Compare
🚀 What
Adds a
is_healthy
function to the model wrapper that users can use to define their custom health checks. This function is called on calls to this endpoint/v1/models/{model_name}
(our existing health check endpoint), if defined. Otherwise we'll fall back to the existing health check functionality which checks if either the load failed or the model is not in a ready state.We also needed to introduce a new endpoint on the inference server:
v1/models/{model_name}/loaded
that the control server will now explicitly call if the health check endpoint is called. This is to ensure that custom health checks are not called on development models.🔬 Testing
Tested on staging with an RC (
0.9.59rc018
)Confirmed that no health check warning logs display before model is loaded, and then health check warnings are logged every 10s that they fail, one for each of the liveness and readiness probes:

Confirmed that these changes won't break with current ctx builder version in staging and production
Ran
poetry run truss push
from my branch into orgs in staging and production without this RC and confirmed that truss push works properly and doesn't break existing deploy and inference flows.Ensure we're still able to fail fast when load fails
Model loading

About a minute later, deployment is terminated because load failed
