Add option to set worker healthcheck timeout #2500

mmac-m3a · 2024-11-01T05:51:18Z

Summary

Adding new command line and config option worker_healthcheck_timeout which sets the timeout for worker liveness from the supervisor when multiple workers are in use. Default timeout is unchanged as well as frequency of health checks.

Rationale

Applications with CPU intensive synchronous startup may starve the worker process for CPU cycles and make the pong thread generate response too late, which in turn makes the supervisor kill and relaunch the worker.

Checklist

I understand that this PR may be closed in case there was no previous discussion. (This doesn't apply to typos!)
I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
I've updated the documentation accordingly.

Rationale for no explicit test

I was not able to create simple unit test that would reliably trigger the health check timeout. I can 100% reliably trigger it in my application and I've also verified that longer health check timeout resolves the problem.

omid-jf · 2024-12-05T20:56:55Z

I was receiving many "Child Process Died" messages in my FastAPI application. Manually increasing the timeout mentioned in this PR fixed my issue.

racinmat · 2025-05-20T09:38:04Z

Hi, is anything blocking merging this PR? It will solve #2450 for many cases, because you could pass higher timeout.

vjeranc · 2025-05-20T10:04:03Z

This was discussed a while ago, the fix to increase the timeout was known then, so not sure what's the blocker.

IMO, there should be no pinging or ponging. OS is not stupid, either process is suspended or it is not. If process is stuck, either in a deadlock or in CPU heavy operation and would be considered "inactive", that's another issue.

Anyone hosting web apps should have their own health checks and restart the process/container.

Similarly, processes/workers in uvicorn are created with a very expensive fork. It would be better to just invoke uvicorn N times at the web app level and completely remove "workers" concept. The savings on the socket level (all processes reading from the same socket is abysmal), cheaper fork would allow lower memory footprint, a much better saving, which is not the case now.

If a cheaper fork was done instead, it would make some sense to manage process liveness at uvicorn level. With current expensive fork, nothing is gained by the workers=N argument, as we see in this case, it just created more bugs.

Dhavaleswar · 2025-08-12T05:29:45Z

waiting for this change to be merged

markdoliner · 2025-10-06T13:47:54Z

vjeranc commented on May 20:

Similarly, processes/workers in uvicorn are created with a very expensive fork. ... cheaper fork would allow lower memory footprint, a much better saving, which is not the case now.

I saw this comment and just wanted to leave a warning: I believe that fork() can have undefined behavior if an application has running threads. That's the reason multiprocessing moved away from it as the default and has been showing DeprecationWarnings about it. See this section of the docs. "On macOS, the spawn start method is now the default. The fork start method should be considered unsafe as it can lead to crashes of the subprocess as macOS system libraries may start threads. See bpo-33725."

vjeranc · 2025-10-06T19:00:46Z

This is a normal way fork works and forking threads are copied to the child process while others vanish.

Using spawn is just a monkey patch over unclear fork misbehavior. Instead of fixing the underlying issue around thread lifetimes, we're deciding on inferior performance. This choice created a 1+ year issue where child processes were killed because their imports were being redone and were taking more than the health check.

Uvicorn having a ping-pong protocol to figure out if child process is responsive is monkey-patch to handle child processes stuck in infinite noncooperative loops. OS never lies about children and we can easily ask OS to check if child process is alive. I don't think it is uvicorn's job to kill processes stuck in noncooperative loops, we already deploy apps in docker images and have separate health checking protocols for our http servers.

Add option to set worker healthcheck timeout

a795717

omid-jf mentioned this pull request Dec 5, 2024

Add option to set worker healthcheck timeout omid-jf/uvicorn#1

Merged

lfrancke mentioned this pull request Sep 9, 2025

Airflow 3.x: find stable combination of uvicorn version and AIRFLOW__API__WORKERS setting stackabletech/airflow-operator#641

Closed

Kludex mentioned this pull request Sep 23, 2025

Add --timeout-worker-healthcheck setting #2711

Merged

Kludex closed this in #2711 Sep 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add option to set worker healthcheck timeout #2500

Add option to set worker healthcheck timeout #2500

mmac-m3a commented Nov 1, 2024

Uh oh!

omid-jf commented Dec 5, 2024

Uh oh!

racinmat commented May 20, 2025

Uh oh!

vjeranc commented May 20, 2025 •

edited

Loading

Uh oh!

Dhavaleswar commented Aug 12, 2025

Uh oh!

markdoliner commented Oct 6, 2025

Uh oh!

vjeranc commented Oct 6, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Uh oh!

Add option to set worker healthcheck timeout #2500

Add option to set worker healthcheck timeout #2500

Conversation

mmac-m3a commented Nov 1, 2024

Summary

Rationale

Checklist

Rationale for no explicit test

Uh oh!

omid-jf commented Dec 5, 2024

Uh oh!

racinmat commented May 20, 2025

Uh oh!

vjeranc commented May 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Dhavaleswar commented Aug 12, 2025

Uh oh!

markdoliner commented Oct 6, 2025

Uh oh!

vjeranc commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

vjeranc commented May 20, 2025 •

edited

Loading

vjeranc commented Oct 6, 2025 •

edited

Loading