Skip to content

Add is_healthy to truss #1283

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 22 commits into from
Jan 24, 2025
Merged

Conversation

spal1
Copy link
Collaborator

@spal1 spal1 commented Dec 13, 2024

🚀 What

Adds a is_healthy function to the model wrapper that users can use to define their custom health checks. This function is called on calls to this endpoint /v1/models/{model_name} (our existing health check endpoint), if defined. Otherwise we'll fall back to the existing health check functionality which checks if either the load failed or the model is not in a ready state.

We also needed to introduce a new endpoint on the inference server: v1/models/{model_name}/loaded that the control server will now explicitly call if the health check endpoint is called. This is to ensure that custom health checks are not called on development models.

🔬 Testing

Tested on staging with an RC (0.9.59rc018)

Confirmed that no health check warning logs display before model is loaded, and then health check warnings are logged every 10s that they fail, one for each of the liveness and readiness probes:
image

Confirmed that these changes won't break with current ctx builder version in staging and production

Ran poetry run truss push from my branch into orgs in staging and production without this RC and confirmed that truss push works properly and doesn't break existing deploy and inference flows.

Ensure we're still able to fail fast when load fails

Model loading
image

About a minute later, deployment is terminated because load failed
image

Copy link

linear bot commented Dec 13, 2024

@spal1 spal1 marked this pull request as ready for review December 20, 2024 22:26
@spal1 spal1 force-pushed the samiksha/bt-13064-add-is_ready-to-model-trusses branch 2 times, most recently from 3b008e7 to cd2d7db Compare January 10, 2025 16:21
@spal1 spal1 requested review from squidarth and jrochette January 10, 2025 17:11
@spal1 spal1 mentioned this pull request Jan 10, 2025
def load(self) -> bool:
time.sleep(10)

def is_ready(self) -> bool:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a test with a non-bool return type

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe binary/bool is not the right data model for this? It seems we have 3 states: ready, not ready and "not defined" - that calls for an enum.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think bool is still valid here. "not defined" defaults to the current health check method, which just checks if the model has successfully loaded, otherwise a boolean is healthy or is not healthy seems to be the most intuitive way to reason about a health check

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I think I got confused because the method of the same name on the model (not the chainlet) can return None if not if hasattr(self, "_chainlet"):

@spal1 spal1 requested a review from marius-baseten January 14, 2025 23:09
is_ready = await self._model.is_ready()
else:
# Offload sync functions to thread, to not block event loop.
is_ready = await to_thread.run_sync(self._model.is_ready)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we expect any use case where this function is not super fast?

I'm just wondering if the overhead for dual sync/async support is warranted or it would be a nice simplification to just make this "naively sync"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll enforce a 10s timeout on health checks. async support is more for parity with the other truss methods we allow calling async or sync

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking was more in the sense of "can we reduce complexity" (no threading, not multiple code branches, not two ways of defining the function) if there's a reasonable case that this function is super fast - but it seems 10s is not super fast. Still wondering what real situations are where this check would not be virtually instant.

def load(self) -> bool:
time.sleep(10)

def is_ready(self) -> bool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe binary/bool is not the right data model for this? It seems we have 3 states: ready, not ready and "not defined" - that calls for an enum.

Comment on lines +125 to 121
NONE = enum.auto()
INPUTS_ONLY = enum.auto()
REQUEST_ONLY = enum.auto()
INPUTS_AND_REQUEST = enum.auto()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, this helper was originally for the preprocess, predict and postprocess functions, for which also prepare_args is useful. I'm not sure if this extension is a bit confusing (also for setup_environment).

I think it makes sense only to use something like MethodDescriptor for setup_environment and is_ready that has no arg_config.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I understand the comment here, I'm currently defining is_healthy as a MethodDescriptor, is there a better way to define is_healthy? I needed to support no args in ArgConfig to allow methods to have no argument other than self

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant was that MethodDescriptor was not envisioned to be used with additional new methods added to the model wrapper - but it's not that important...

@spal1 spal1 force-pushed the samiksha/bt-13064-add-is_ready-to-model-trusses branch 2 times, most recently from 0e62bb1 to 9c69f9f Compare January 16, 2025 23:21
@spal1 spal1 changed the title Add is_ready to model trusses Add is_healthy to truss Jan 17, 2025
@spal1 spal1 requested a review from marius-baseten January 17, 2025 00:01
Copy link
Collaborator

@jrochette jrochette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.

@spal1 spal1 force-pushed the samiksha/bt-13064-add-is_ready-to-model-trusses branch 3 times, most recently from 38d865e to cb675f9 Compare January 23, 2025 00:31
url = URL(path=request.url.path, query=request.url.query.encode("utf-8"))

path = request.url.path
if path == "/v1/models/model":
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to ensure we still match on /v1/models/{model_name}, will update the logic here

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

took a closer look at where we call these inference server endpoints and it looks like we hardcode v1/models/model everywhere else, I think its ok hardcoding it here as well.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clarifying what the logic here would be:

path = request.url.path
path_params = request.path_params["path"].split("/")
if len(path_params) == 2 and path_params[0] == "models":
    # Reroute health checks to the inference server's /v1/models/{model_name}/loaded endpoint
    path += "/loaded"

since we technically define {model_name} as a path parameter in the inference server, we could support this parameter in the control server routing logic i've added in this PR

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chatted with @squidarth and went with keeping the checks hardcoded to v1/models/model since we use this everywhere we call the inference server, and moved the string comparison logic to a helper fn: _reroute_if_health_check



def _custom_stop_strategy(retry_state: RetryCallState) -> bool:
# Stop after 10 attempts for ModelNotReady
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How long can this end up waiting and retrying?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be capped at 75 seconds from first attempt to the last attempt.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could be shorter. 75s seems long, and I'm not sure anything is going to wait that long for a response on the healthcheck. For example, the operator's retry timeout on this is 2s. The wake call might wait longer though

Copy link
Collaborator Author

@spal1 spal1 Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did some digging and looks like we just call GET / on wake, which shouldn't take long. I'll go ahead and update the retries here to be once a second for 10 attempts

Comment on lines +181 to +180
# For all other exceptions, stop after INFERENCE_SERVER_START_WAIT_SECS
seconds_since_start = (
retry_state.seconds_since_start
if retry_state.seconds_since_start is not None
else 0.0
)
return seconds_since_start >= INFERENCE_SERVER_START_WAIT_SECS
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: It shouldn't if I understand the code correctly, but want to double check: does this change the retry behaviour?

Is it possible for retry_state.seconds_since_start to never be set and for this to retry forever?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think its possible for seconds_since_start to never be set, looks like its set in the __init__ of RetryCallState https://github.com/jd/tenacity/blob/main/tenacity/__init__.py#L532

@spal1 spal1 force-pushed the samiksha/bt-13064-add-is_ready-to-model-trusses branch from c52ed3e to 0fa72ee Compare January 24, 2025 17:36
@spal1 spal1 merged commit d08b512 into main Jan 24, 2025
7 checks passed
@spal1 spal1 deleted the samiksha/bt-13064-add-is_ready-to-model-trusses branch January 24, 2025 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants