Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-37120][pipeline-connector/mysql] Add ending split chunk first to avoid TaskManager oom #3856

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

beryllw
Copy link
Contributor

@beryllw beryllw commented Jan 14, 2025

Add ending split chunk first to avoid TaskManager oom

@beryllw beryllw changed the title Add ending split chunk first to avoid TaskManager oom [FLINK-37120][pipeline-connector/mysql] Add ending split chunk first to avoid TaskManager oom Jan 14, 2025
@github-actions github-actions bot added the base label Jan 15, 2025
@beryllw
Copy link
Contributor Author

beryllw commented Jan 15, 2025

CI:https://github.com/beryllw/flink-cdc/actions/runs/12779452404
@leonardBang Could you please help me retrigger the CI?

@beryllw
Copy link
Contributor Author

beryllw commented Jan 15, 2025

Adding the ending split chunk first changes the snapshot read result order. I will fix the unit test.

@leonardBang
Copy link
Contributor

THanks @beryllw for the contribution, the idea makes sense to me, but we need to consider the compatibility, a configuration is recommended

@beryllw
Copy link
Contributor Author

beryllw commented Jan 15, 2025

THanks @beryllw for the contribution, the idea makes sense to me, but we need to consider the compatibility, a configuration is recommended

If data consistency is not an issue, are there any compatibility concerns we need to address?

@@ -498,7 +498,7 @@ private List<ChunkRange> splitUnevenlySizedChunks(
chunkEnd = nextChunkEnd(jdbc, chunkEnd, tableId, splitColumn, max, chunkSize);
}
// add the ending split
splits.add(ChunkRange.of(chunkStart, null));
splits.add(0, ChunkRange.of(chunkStart, null));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good optimization, so both the largest and smallest unbounded chunks are completed in the first two splits, and subsequent chunks are bounded size.

@lvyanquan
Copy link
Contributor

What‘s more, could you add this optimization to MongoDBChunkSplitter?

@beryllw
Copy link
Contributor Author

beryllw commented Jan 16, 2025

What‘s more, could you add this optimization to MongoDBChunkSplitter?

Sure, i will check MongoDBChunkSplitter.

@beryllw beryllw force-pushed the ending_chunk_first branch from 1004386 to 9a7ed71 Compare January 17, 2025 03:33
@beryllw beryllw requested a review from lvyanquan January 17, 2025 03:33
@beryllw
Copy link
Contributor Author

beryllw commented Jan 17, 2025

What‘s more, could you add this optimization to MongoDBChunkSplitter?

#3704 (comment)

I missed this issue #3704, good idea. Maybe support AssignStrategy in base module is better? Do we implement this in a future PR or in this PR?

@lvyanquan
Copy link
Contributor

lvyanquan commented Jan 17, 2025

I missed this issue #3704, good idea. Maybe support AssignStrategy in base module is better? Do we implement this in a future PR or in this PR?

My previous PR provided both ascending_order and descending_order order configurations, but I think it would be better to place both start and end unbounded chunks at the beginning, so you can go ahead.

I think if there is no impact on restarting from the previous state, there is no compatibility issue, but adding a parameter to control it would be safer.

@github-actions github-actions bot added the docs Improvements or additions to documentation label Jan 17, 2025
@lvyanquan
Copy link
Contributor

I think adding a test in MySqlSourceITCase to test the case of restoring from failure will be better.

@beryllw
Copy link
Contributor Author

beryllw commented Jan 20, 2025

I think adding a test in MySqlSourceITCase to test the case of restoring from failure will be better.

agree.

@beryllw beryllw requested a review from lvyanquan January 20, 2025 08:57
@lvyanquan
Copy link
Contributor

lvyanquan commented Jan 22, 2025

LGTM.
I'm worried if this parameter is a bit complicated, what about naming it scan.incremental.assign-max-chunk-first.enabled to avoid using ending? As chunk is a concept in snapshot phase.
WDYT @beryllw @leonardBang?

@beryllw
Copy link
Contributor Author

beryllw commented Jan 23, 2025

LGTM. I'm worried if this parameter is a bit complicated, what about naming it scan.incremental.assign-max-chunk-first.enabled to avoid using ending? As chunk is a concept in snapshot phase. WDYT @beryllw @leonardBang?

What about scan.incremental.assign-ending-chunk-first.enabled, when the write throughput is low, the ending chunk is not equal to the maximum chunk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants