-
-
Notifications
You must be signed in to change notification settings - Fork 889
Add option to set worker healthcheck timeout #2500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
I was receiving many "Child Process Died" messages in my FastAPI application. Manually increasing the timeout mentioned in this PR fixed my issue. |
|
Hi, is anything blocking merging this PR? It will solve #2450 for many cases, because you could pass higher timeout. |
|
This was discussed a while ago, the fix to increase the timeout was known then, so not sure what's the blocker. IMO, there should be no pinging or ponging. OS is not stupid, either process is suspended or it is not. If process is stuck, either in a deadlock or in CPU heavy operation and would be considered "inactive", that's another issue. Anyone hosting web apps should have their own health checks and restart the process/container. Similarly, processes/workers in uvicorn are created with a very expensive fork. It would be better to just invoke uvicorn N times at the web app level and completely remove "workers" concept. The savings on the socket level (all processes reading from the same socket is abysmal), cheaper fork would allow lower memory footprint, a much better saving, which is not the case now. If a cheaper fork was done instead, it would make some sense to manage process liveness at uvicorn level. With current expensive fork, nothing is gained by the |
|
waiting for this change to be merged |
|
vjeranc commented on May 20:
I saw this comment and just wanted to leave a warning: I believe that |
|
This is a normal way Using Uvicorn having a ping-pong protocol to figure out if child process is responsive is monkey-patch to handle child processes stuck in infinite noncooperative loops. OS never lies about children and we can easily ask OS to check if child process is alive. I don't think it is uvicorn's job to kill processes stuck in noncooperative loops, we already deploy apps in docker images and have separate health checking protocols for our http servers. |
Summary
Adding new command line and config option
worker_healthcheck_timeoutwhich sets the timeout for worker liveness from the supervisor when multiple workers are in use. Default timeout is unchanged as well as frequency of health checks.Rationale
Applications with CPU intensive synchronous startup may starve the worker process for CPU cycles and make the
pongthread generate response too late, which in turn makes the supervisor kill and relaunch the worker.Checklist
Rationale for no explicit test
I was not able to create simple unit test that would reliably trigger the health check timeout. I can 100% reliably trigger it in my application and I've also verified that longer health check timeout resolves the problem.