Fetch active tasks from memory in SeekableStreamSupervisor#16098
Fetch active tasks from memory in SeekableStreamSupervisor#16098kfaraz merged 12 commits intoapache:masterfrom
Conversation
|
What problems does this PR address? |
|
The SeekableStreamSupervisor fetches the task payloads for every active task in its datasource twice every RunNotice. |
|
This pull request has been marked as stale due to 60 days of inactivity. |
|
This pull request/issue has been closed due to lack of activity. If you think that |
|
@AmatyaAvadhanula , the change here makes sense to me. |
kfaraz
left a comment
There was a problem hiding this comment.
LGTM 🚀
@AmatyaAvadhanula , the SeekableStreamSupervisor also makes calls to taskStorage.getTask(). I wonder if these calls should also first check for those tasks in memory. If yes, then we should probably just remove TaskStorage from SeekableStreamSupervisor and use TaskQueryTool instead and route everything from there.
The TaskQueryTool can decide if a task should be served from memory or storage.
What do you think?
…e_tasks_from_memory
|
This pull request has been marked as stale due to 60 days of inactivity. |
|
This pull request/issue has been closed due to lack of activity. If you think that |
…e_tasks_from_memory
| getTaskGroupIdForPartition(resetPartitionOffset.getKey()) | ||
| ); | ||
| final boolean isSameOffset = partitionTaskGroup != null | ||
| && partitionTaskGroup.startingSequences.containsKey(resetPartitionOffset.getKey()) |
There was a problem hiding this comment.
Added for null safety in the next condition.
|
Thank you for reviving the PR and getting it to completion, @kfaraz. |
The SeekableStreamSupervisor fetches the task payloads for every active task in its datasource twice every RunNotice.
In large clusters, this may cause the RunNotice to take a long time when it may be able to complete within a couple of seconds otherwise.
If there are hundreds of supervisors, there are 4 * supervisors calls to the metadata store every minute to fetch all the active datasource task payloads. This change can help reduce the load on the db significantly in such cases.
This PR has: