Skip to content

Fetch active tasks from memory in SeekableStreamSupervisor#16098

Merged
kfaraz merged 12 commits intoapache:masterfrom
AmatyaAvadhanula:sss_fetch_active_tasks_from_memory
Feb 8, 2025
Merged

Fetch active tasks from memory in SeekableStreamSupervisor#16098
kfaraz merged 12 commits intoapache:masterfrom
AmatyaAvadhanula:sss_fetch_active_tasks_from_memory

Conversation

@AmatyaAvadhanula
Copy link
Contributor

@AmatyaAvadhanula AmatyaAvadhanula commented Mar 11, 2024

The SeekableStreamSupervisor fetches the task payloads for every active task in its datasource twice every RunNotice.
In large clusters, this may cause the RunNotice to take a long time when it may be able to complete within a couple of seconds otherwise.
If there are hundreds of supervisors, there are 4 * supervisors calls to the metadata store every minute to fetch all the active datasource task payloads. This change can help reduce the load on the db significantly in such cases.

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@abhishekagarwal87
Copy link
Contributor

What problems does this PR address?

@AmatyaAvadhanula
Copy link
Contributor Author

AmatyaAvadhanula commented Mar 11, 2024

The SeekableStreamSupervisor fetches the task payloads for every active task in its datasource twice every RunNotice.
In large clusters, this may cause the RunNotice to take a long time when it may be able to complete within a couple of seconds otherwise.
If there are hundreds of supervisors, there are 4 * supervisors calls to the metadata store every minute to fetch all the active datasource task payloads. This change can help reduce the load on the db significantly in such cases.

@github-actions
Copy link

github-actions bot commented Jul 7, 2024

This pull request has been marked as stale due to 60 days of inactivity.
It will be closed in 4 weeks if no further activity occurs. If you think
that's incorrect or this pull request should instead be reviewed, please simply
write any comment. Even if closed, you can still revive the PR at any time or
discuss it on the dev@druid.apache.org list.
Thank you for your contributions.

@github-actions github-actions bot added the stale label Jul 7, 2024
@github-actions
Copy link

github-actions bot commented Aug 4, 2024

This pull request/issue has been closed due to lack of activity. If you think that
is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this Aug 4, 2024
@github-actions github-actions bot removed the stale label Oct 15, 2024
@kfaraz
Copy link
Contributor

kfaraz commented Oct 17, 2024

@AmatyaAvadhanula , the change here makes sense to me.
Can we move this move this from Draft to Ready?
There seem to be some merge conflicts.

@AmatyaAvadhanula AmatyaAvadhanula marked this pull request as ready for review October 17, 2024 20:54
Copy link
Contributor

@kfaraz kfaraz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

@AmatyaAvadhanula , the SeekableStreamSupervisor also makes calls to taskStorage.getTask(). I wonder if these calls should also first check for those tasks in memory. If yes, then we should probably just remove TaskStorage from SeekableStreamSupervisor and use TaskQueryTool instead and route everything from there.

The TaskQueryTool can decide if a task should be served from memory or storage.
What do you think?

@github-actions
Copy link

github-actions bot commented Jan 5, 2025

This pull request has been marked as stale due to 60 days of inactivity.
It will be closed in 4 weeks if no further activity occurs. If you think
that's incorrect or this pull request should instead be reviewed, please simply
write any comment. Even if closed, you can still revive the PR at any time or
discuss it on the dev@druid.apache.org list.
Thank you for your contributions.

@github-actions github-actions bot added the stale label Jan 5, 2025
@github-actions
Copy link

github-actions bot commented Feb 3, 2025

This pull request/issue has been closed due to lack of activity. If you think that
is incorrect, or the pull request requires review, you can revive the PR at any time.

@github-actions github-actions bot closed this Feb 3, 2025
@kfaraz kfaraz reopened this Feb 6, 2025
@github-actions github-actions bot removed the stale label Feb 7, 2025
@kfaraz kfaraz self-requested a review February 7, 2025 06:40
getTaskGroupIdForPartition(resetPartitionOffset.getKey())
);
final boolean isSameOffset = partitionTaskGroup != null
&& partitionTaskGroup.startingSequences.containsKey(resetPartitionOffset.getKey())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added for null safety in the next condition.

@AmatyaAvadhanula
Copy link
Contributor Author

Thank you for reviving the PR and getting it to completion, @kfaraz.
LGTM!

@kfaraz kfaraz merged commit fd73e49 into apache:master Feb 8, 2025
74 checks passed
@kfaraz kfaraz deleted the sss_fetch_active_tasks_from_memory branch February 9, 2025 05:31
@kgyrtkirk kgyrtkirk added this to the 33.0.0 milestone Apr 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants