feat(taskworker) Add concurrent worker #83254

markstory · 2025-01-10T19:57:17Z

Move the taskworker process to be a multiprocess concurrent worker. This will help enable higher CPU usage in worker pods, as we can pack more concurrent CPU operations into each pod (at the cost of memory).

The main process is responsible for:

Spawning children
Making RPC requests to fill child queues and submit results.

Each child process handles:

Resolving task names
Checking at_most_once keys
Enforcing processing deadlines
Executing task functions

Instead of using more child processes to enforce timeouts, I've used SIGALRM. I've verified that tasks like

@exampletasks.register(name="examples.infinite", retry=Retry(times=2))
def infinite_task() -> None:
    try:
        while True:
            pass
    except Exception as e:
        print("haha caught exception", e)

Do not paralyze workers with infinite loops.

When a worker is terminated, it uses an Event to have children exit, and then drains any results. If there are tasks in the _child_tasks queue will not be completed, and instead will sent to another worker when the processing_deadline on the activations expires.

Still really rough and not working.

codecov · 2025-01-10T20:35:02Z

Codecov Report

Attention: Patch coverage is 87.78626% with 32 lines in your changes missing coverage. Please review.

✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
src/sentry/taskworker/worker.py	80.25%	31 Missing ⚠️
tests/sentry/taskworker/test_worker.py	99.04%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #83254      +/-   ##
==========================================
+ Coverage   87.49%   87.56%   +0.06%     
==========================================
  Files        9404     9393      -11     
  Lines      537180   536973     -207     
  Branches    21133    21048      -85     
==========================================
+ Hits       470024   470203     +179     
+ Misses      66798    66414     -384     
+ Partials      358      356       -2

src/sentry/taskworker/worker.py

evanh · 2025-01-14T15:50:08Z

src/sentry/taskworker/worker.py

-        task = self._get_known_task(activation)
-        if not task:
+        try:
+            activation = child_tasks.get_nowait()


I don't think get_nowait is necessarily correct here. I understand this is ensuring that the process doesn't block while waiting for a task before checking for the shutdown, but I think some kind of timeout/delay would good to here to avoid spiking the CPU. Maybe like 100ms or something like that?

Good point about the potential CPU burn on an empty queue. I'll put a blocking get with a timeout in.

src/sentry/taskworker/worker.py

Co-authored-by: Evan Hicks <evanh@users.noreply.github.com>

Wait on the empty queue to reduce CPU burn.

sentry-io · 2025-01-17T19:39:16Z

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

‼️ AssertionError: expected call not found. pytest.runtest.protocol tests/sentry/taskworker... View Issue

_{Did you find this useful? React with a 👍 or 👎}

Move the taskworker process to be a multiprocess concurrent worker. This will help enable higher CPU usage in worker pods, as we can pack more concurrent CPU operations into each pod (at the cost of memory). The main process is responsible for: - Spawning children - Making RPC requests to fill child queues and submit results. Each child process handles: - Resolving task names - Checking at_most_once keys - Enforcing processing deadlines - Executing task functions Instead of using more child processes to enforce timeouts, I've used SIGALRM. I've verified that tasks like ```python @exampletasks.register(name="examples.infinite", retry=Retry(times=2)) def infinite_task() -> None: try: while True: pass except Exception as e: print("haha caught exception", e) ``` Do not paralyze workers with infinite loops. When a worker is terminated, it uses an `Event` to have children exit, and then drains any results. If there are tasks in the `_child_tasks` queue will not be completed, and instead will sent to another worker when the `processing_deadline` on the activations expires. --------- Co-authored-by: Evan Hicks <evanh@users.noreply.github.com>

markstory added 6 commits January 10, 2025 10:30

Start roughing in a multiprocess worker.

447caf2

Still really rough and not working.

Roughly implemented but not working yet.

a104160

Fix mistakes

01c3cbd

Fix more mistakes

13d169a

Fix mypy

4ec94bf

Fix some problems and update tests.

f467b08

markstory requested a review from a team as a code owner January 10, 2025 19:57

markstory linked an issue Jan 10, 2025 that may be closed by this pull request

Make taskworker worker support concurrent work #80369

Closed

github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Jan 10, 2025

More coverage for fetching next task

b6b8352

vercel bot deployed to Preview January 11, 2025 05:21 View deployment

evanh reviewed Jan 14, 2025

View reviewed changes

Update src/sentry/taskworker/worker.py

c284d50

Co-authored-by: Evan Hicks <evanh@users.noreply.github.com>

vercel bot deployed to Preview January 14, 2025 16:12 View deployment

markstory added 2 commits January 14, 2025 11:23

Merge branch 'master' into feat-taskworker-concurrent

40c121e

Add get() with timeout.

9cd9cb7

Wait on the empty queue to reduce CPU burn.

vercel bot deployed to Preview January 14, 2025 17:02 View deployment

Fix child_task queue overflows by conditionally fetching next task

bd5654c

vercel bot deployed to Preview January 14, 2025 22:14 View deployment

evanh approved these changes Jan 16, 2025

View reviewed changes

markstory merged commit 2a26aed into master Jan 17, 2025
49 checks passed

markstory deleted the feat-taskworker-concurrent branch January 17, 2025 14:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(taskworker) Add concurrent worker #83254

feat(taskworker) Add concurrent worker #83254

markstory commented Jan 10, 2025

codecov bot commented Jan 10, 2025 •

edited

Loading

evanh Jan 14, 2025

markstory Jan 14, 2025

sentry-io bot commented Jan 17, 2025

feat(taskworker) Add concurrent worker #83254

feat(taskworker) Add concurrent worker #83254

Conversation

markstory commented Jan 10, 2025

codecov bot commented Jan 10, 2025 • edited Loading

Codecov Report

evanh Jan 14, 2025

Choose a reason for hiding this comment

markstory Jan 14, 2025

Choose a reason for hiding this comment

sentry-io bot commented Jan 17, 2025

Suspect Issues

codecov bot commented Jan 10, 2025 •

edited

Loading