-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(taskworker) Add concurrent worker #83254
Conversation
Codecov ReportAttention: Patch coverage is ✅ All tests successful. No failed tests found.
Additional details and impacted files@@ Coverage Diff @@
## master #83254 +/- ##
==========================================
+ Coverage 87.49% 87.56% +0.06%
==========================================
Files 9404 9393 -11
Lines 537180 536973 -207
Branches 21133 21048 -85
==========================================
+ Hits 470024 470203 +179
+ Misses 66798 66414 -384
+ Partials 358 356 -2 |
src/sentry/taskworker/worker.py
Outdated
task = self._get_known_task(activation) | ||
if not task: | ||
try: | ||
activation = child_tasks.get_nowait() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think get_nowait
is necessarily correct here. I understand this is ensuring that the process doesn't block while waiting for a task before checking for the shutdown, but I think some kind of timeout/delay would good to here to avoid spiking the CPU. Maybe like 100ms or something like that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point about the potential CPU burn on an empty queue. I'll put a blocking get with a timeout in.
Co-authored-by: Evan Hicks <evanh@users.noreply.github.com>
Wait on the empty queue to reduce CPU burn.
Suspect IssuesThis pull request was deployed and Sentry observed the following issues:
Did you find this useful? React with a 👍 or 👎 |
Move the taskworker process to be a multiprocess concurrent worker. This will help enable higher CPU usage in worker pods, as we can pack more concurrent CPU operations into each pod (at the cost of memory). The main process is responsible for: - Spawning children - Making RPC requests to fill child queues and submit results. Each child process handles: - Resolving task names - Checking at_most_once keys - Enforcing processing deadlines - Executing task functions Instead of using more child processes to enforce timeouts, I've used SIGALRM. I've verified that tasks like ```python @exampletasks.register(name="examples.infinite", retry=Retry(times=2)) def infinite_task() -> None: try: while True: pass except Exception as e: print("haha caught exception", e) ``` Do not paralyze workers with infinite loops. When a worker is terminated, it uses an `Event` to have children exit, and then drains any results. If there are tasks in the `_child_tasks` queue will not be completed, and instead will sent to another worker when the `processing_deadline` on the activations expires. --------- Co-authored-by: Evan Hicks <evanh@users.noreply.github.com>
Move the taskworker process to be a multiprocess concurrent worker. This will help enable higher CPU usage in worker pods, as we can pack more concurrent CPU operations into each pod (at the cost of memory).
The main process is responsible for:
Each child process handles:
Instead of using more child processes to enforce timeouts, I've used SIGALRM. I've verified that tasks like
Do not paralyze workers with infinite loops.
When a worker is terminated, it uses an
Event
to have children exit, and then drains any results. If there are tasks in the_child_tasks
queue will not be completed, and instead will sent to another worker when theprocessing_deadline
on the activations expires.