Fetch active tasks from memory in SeekableStreamSupervisor by AmatyaAvadhanula · Pull Request #16098 · apache/druid

AmatyaAvadhanula · 2024-03-11T09:58:56Z

The SeekableStreamSupervisor fetches the task payloads for every active task in its datasource twice every RunNotice.
In large clusters, this may cause the RunNotice to take a long time when it may be able to complete within a couple of seconds otherwise.
If there are hundreds of supervisors, there are 4 * supervisors calls to the metadata store every minute to fetch all the active datasource task payloads. This change can help reduce the load on the db significantly in such cases.

This PR has:

abhishekagarwal87 · 2024-03-11T10:36:22Z

What problems does this PR address?

AmatyaAvadhanula · 2024-03-11T11:12:44Z

The SeekableStreamSupervisor fetches the task payloads for every active task in its datasource twice every RunNotice.
In large clusters, this may cause the RunNotice to take a long time when it may be able to complete within a couple of seconds otherwise.
If there are hundreds of supervisors, there are 4 * supervisors calls to the metadata store every minute to fetch all the active datasource task payloads. This change can help reduce the load on the db significantly in such cases.

…tasks_from_memory

github-actions · 2024-07-07T00:19:43Z

This pull request has been marked as stale due to 60 days of inactivity.
It will be closed in 4 weeks if no further activity occurs. If you think
that's incorrect or this pull request should instead be reviewed, please simply
write any comment. Even if closed, you can still revive the PR at any time or
discuss it on the dev@druid.apache.org list.
Thank you for your contributions.

github-actions · 2024-08-04T00:20:11Z

This pull request/issue has been closed due to lack of activity. If you think that
is incorrect, or the pull request requires review, you can revive the PR at any time.

kfaraz · 2024-10-17T12:39:39Z

@AmatyaAvadhanula , the change here makes sense to me.
Can we move this move this from Draft to Ready?
There seem to be some merge conflicts.

kfaraz

LGTM 🚀

@AmatyaAvadhanula , the SeekableStreamSupervisor also makes calls to taskStorage.getTask(). I wonder if these calls should also first check for those tasks in memory. If yes, then we should probably just remove TaskStorage from SeekableStreamSupervisor and use TaskQueryTool instead and route everything from there.

The TaskQueryTool can decide if a task should be served from memory or storage.
What do you think?

…e_tasks_from_memory

github-actions · 2025-01-05T00:23:17Z

This pull request has been marked as stale due to 60 days of inactivity.
It will be closed in 4 weeks if no further activity occurs. If you think
that's incorrect or this pull request should instead be reviewed, please simply
write any comment. Even if closed, you can still revive the PR at any time or
discuss it on the dev@druid.apache.org list.
Thank you for your contributions.

github-actions · 2025-02-03T00:21:30Z

This pull request/issue has been closed due to lack of activity. If you think that
is incorrect, or the pull request requires review, you can revive the PR at any time.

…e_tasks_from_memory

kfaraz · 2025-02-07T07:04:30Z

.../main/java/org/apache/druid/indexing/seekablestream/supervisor/SeekableStreamSupervisor.java

              getTaskGroupIdForPartition(resetPartitionOffset.getKey())
          );
          final boolean isSameOffset = partitionTaskGroup != null
+                                       && partitionTaskGroup.startingSequences.containsKey(resetPartitionOffset.getKey())


Added for null safety in the next condition.

AmatyaAvadhanula · 2025-02-08T11:47:12Z

Thank you for reviving the PR and getting it to completion, @kfaraz.
LGTM!

AmatyaAvadhanula added 2 commits March 11, 2024 10:40

Supervisors fetch active tasks from memory

5bbffd3

Fix KafkaSupervisorTest

ad469ef

github-actions bot added Area - Streaming Ingestion Area - Ingestion labels Mar 11, 2024

AmatyaAvadhanula added 2 commits March 11, 2024 19:26

Fix supervisor state test

0fe1a86

Merge remote-tracking branch 'upstream/master' into sss_fetch_active_…

7cdaa7c

…tasks_from_memory

github-actions bot added the stale label Jul 7, 2024

github-actions bot closed this Aug 4, 2024

AmatyaAvadhanula reopened this Oct 14, 2024

github-actions bot removed the stale label Oct 15, 2024

Resolve merge conflicts

e9b2da3

AmatyaAvadhanula marked this pull request as ready for review October 17, 2024 20:54

AmatyaAvadhanula requested a review from kfaraz October 17, 2024 20:55

kfaraz approved these changes Oct 18, 2024

View reviewed changes

Merge branch 'master' of github.com:apache/druid into sss_fetch_activ…

32ecdb4

…e_tasks_from_memory

github-actions bot added the stale label Jan 5, 2025

github-actions bot closed this Feb 3, 2025

kfaraz reopened this Feb 6, 2025

github-actions bot removed the stale label Feb 7, 2025

Fixing strict compile

9ab70d3

kfaraz self-requested a review February 7, 2025 06:40

Merge branch 'master' of github.com:apache/druid into sss_fetch_activ…

bbc3234

…e_tasks_from_memory

kfaraz reviewed Feb 7, 2025

View reviewed changes

kfaraz added 4 commits February 7, 2025 12:40

Minor cleanup

7044f8b

Try to reduce lock contention

30902c3

Fix mocks in tests

cf55d42

Fix tests

fd1800c

kfaraz merged commit fd73e49 into apache:master Feb 8, 2025
74 checks passed

kfaraz deleted the sss_fetch_active_tasks_from_memory branch February 9, 2025 05:31

kgyrtkirk added this to the 33.0.0 milestone Apr 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fetch active tasks from memory in SeekableStreamSupervisor#16098

Fetch active tasks from memory in SeekableStreamSupervisor#16098
kfaraz merged 12 commits intoapache:masterfrom
AmatyaAvadhanula:sss_fetch_active_tasks_from_memory

AmatyaAvadhanula commented Mar 11, 2024 •

edited

Loading

Uh oh!

abhishekagarwal87 commented Mar 11, 2024

Uh oh!

AmatyaAvadhanula commented Mar 11, 2024 •

edited

Loading

Uh oh!

github-actions bot commented Jul 7, 2024

Uh oh!

github-actions bot commented Aug 4, 2024

Uh oh!

kfaraz commented Oct 17, 2024

Uh oh!

kfaraz left a comment

Uh oh!

github-actions bot commented Jan 5, 2025

Uh oh!

github-actions bot commented Feb 3, 2025

Uh oh!

kfaraz Feb 7, 2025

Uh oh!

AmatyaAvadhanula commented Feb 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

AmatyaAvadhanula commented Mar 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abhishekagarwal87 commented Mar 11, 2024

Uh oh!

AmatyaAvadhanula commented Mar 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jul 7, 2024

Uh oh!

github-actions bot commented Aug 4, 2024

Uh oh!

kfaraz commented Oct 17, 2024

Uh oh!

kfaraz left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jan 5, 2025

Uh oh!

github-actions bot commented Feb 3, 2025

Uh oh!

kfaraz Feb 7, 2025

Choose a reason for hiding this comment

Uh oh!

AmatyaAvadhanula commented Feb 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

AmatyaAvadhanula commented Mar 11, 2024 •

edited

Loading

AmatyaAvadhanula commented Mar 11, 2024 •

edited

Loading