[FLINK-37120][pipeline-connector/mysql] Add ending split chunk first to avoid TaskManager oom #3856

beryllw · 2025-01-14T08:44:44Z

Add ending split chunk first to avoid TaskManager oom

beryllw · 2025-01-15T02:38:24Z

CI：https://github.com/beryllw/flink-cdc/actions/runs/12779452404
@leonardBang Could you please help me retrigger the CI?

beryllw · 2025-01-15T03:49:26Z

Adding the ending split chunk first changes the snapshot read result order. I will fix the unit test.

leonardBang · 2025-01-15T03:56:23Z

THanks @beryllw for the contribution, the idea makes sense to me, but we need to consider the compatibility, a configuration is recommended

beryllw · 2025-01-15T05:39:15Z

THanks @beryllw for the contribution, the idea makes sense to me, but we need to consider the compatibility, a configuration is recommended

If data consistency is not an issue, are there any compatibility concerns we need to address?

lvyanquan · 2025-01-16T09:32:37Z

...a/org/apache/flink/cdc/connectors/base/source/assigner/splitter/JdbcSourceChunkSplitter.java

@@ -498,7 +498,7 @@ private List<ChunkRange> splitUnevenlySizedChunks(
            chunkEnd = nextChunkEnd(jdbc, chunkEnd, tableId, splitColumn, max, chunkSize);
        }
        // add the ending split
-        splits.add(ChunkRange.of(chunkStart, null));
+        splits.add(0, ChunkRange.of(chunkStart, null));


Good optimization, so both the largest and smallest unbounded chunks are completed in the first two splits, and subsequent chunks are bounded size.

lvyanquan · 2025-01-16T09:33:46Z

What‘s more, could you add this optimization to MongoDBChunkSplitter?

beryllw · 2025-01-16T10:02:48Z

What‘s more, could you add this optimization to MongoDBChunkSplitter?

Sure, i will check MongoDBChunkSplitter.

…to avoid TaskManager oom

…void TaskManager oom

beryllw · 2025-01-17T03:50:27Z

What‘s more, could you add this optimization to MongoDBChunkSplitter?

#3704 (comment)

I missed this issue #3704, good idea. Maybe support AssignStrategy in base module is better? Do we implement this in a future PR or in this PR?

lvyanquan · 2025-01-17T04:17:54Z

I missed this issue #3704, good idea. Maybe support AssignStrategy in base module is better? Do we implement this in a future PR or in this PR?

My previous PR provided both ascending_order and descending_order order configurations, but I think it would be better to place both start and end unbounded chunks at the beginning, so you can go ahead.

I think if there is no impact on restarting from the previous state, there is no compatibility issue, but adding a parameter to control it would be safer.

docs/content.zh/docs/connectors/pipeline-connectors/mysql.md

...ysql/src/main/java/org/apache/flink/cdc/connectors/mysql/factory/MySqlDataSourceFactory.java

lvyanquan · 2025-01-19T08:10:29Z

I think adding a test in MySqlSourceITCase to test the case of restoring from failure will be better.

beryllw · 2025-01-20T07:06:44Z

I think adding a test in MySqlSourceITCase to test the case of restoring from failure will be better.

agree.

lvyanquan · 2025-01-22T11:36:34Z

LGTM.
I'm worried if this parameter is a bit complicated, what about naming it scan.incremental.assign-max-chunk-first.enabled to avoid using ending? As chunk is a concept in snapshot phase.
WDYT @beryllw @leonardBang?

beryllw · 2025-01-23T03:04:36Z

LGTM. I'm worried if this parameter is a bit complicated, what about naming it scan.incremental.assign-max-chunk-first.enabled to avoid using ending? As chunk is a concept in snapshot phase. WDYT @beryllw @leonardBang?

What about scan.incremental.assign-ending-chunk-first.enabled, when the write throughput is low, the ending chunk is not equal to the maximum chunk.

github-actions bot added the mysql-cdc-connector label Jan 14, 2025

beryllw changed the title ~~Add ending split chunk first to avoid TaskManager oom~~ [FLINK-37120][pipeline-connector/mysql] Add ending split chunk first to avoid TaskManager oom Jan 14, 2025

github-actions bot added the base label Jan 15, 2025

lvyanquan reviewed Jan 16, 2025

View reviewed changes

github-actions bot added mongodb-cdc-connector oracle-cdc-connector db2-cdc-connector mysql-pipeline-connector labels Jan 17, 2025

beryllw added 3 commits January 17, 2025 11:28

[FLINK-37120][pipeline-connector/mysql] add ending split chunk first …

f105578

…to avoid TaskManager oom

[FLINK-37120][cdc-source-connector] Add ending split chunk first to a…

13e511a

…void TaskManager oom

support MongoDBChunkSplitter add ending chunk first

9a7ed71

beryllw force-pushed the ending_chunk_first branch from 1004386 to 9a7ed71 Compare January 17, 2025 03:33

beryllw requested a review from lvyanquan January 17, 2025 03:33

fix bug

0747277

github-actions bot added postgres-cdc-connector sqlserver-cdc-connector labels Jan 17, 2025

wangjunbo added 5 commits January 17, 2025 12:49

fix bug

c560ab3

fix unit test bug

f4bef15

fix unit test bug

93132ed

fix unit test bug

a36132b

add doc

d1be97a

github-actions bot added the docs Improvements or additions to documentation label Jan 17, 2025

lvyanquan reviewed Jan 19, 2025

View reviewed changes

docs/content.zh/docs/connectors/pipeline-connectors/mysql.md Outdated Show resolved Hide resolved

...ysql/src/main/java/org/apache/flink/cdc/connectors/mysql/factory/MySqlDataSourceFactory.java Show resolved Hide resolved

add more unit tests

432f663

beryllw requested a review from lvyanquan January 20, 2025 08:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-37120][pipeline-connector/mysql] Add ending split chunk first to avoid TaskManager oom #3856

[FLINK-37120][pipeline-connector/mysql] Add ending split chunk first to avoid TaskManager oom #3856

beryllw commented Jan 14, 2025

beryllw commented Jan 15, 2025 •

edited

Loading

beryllw commented Jan 15, 2025

leonardBang commented Jan 15, 2025

beryllw commented Jan 15, 2025

lvyanquan Jan 16, 2025

lvyanquan commented Jan 16, 2025

beryllw commented Jan 16, 2025

beryllw commented Jan 17, 2025

lvyanquan commented Jan 17, 2025 •

edited

Loading

lvyanquan commented Jan 19, 2025

beryllw commented Jan 20, 2025

lvyanquan commented Jan 22, 2025 •

edited

Loading

beryllw commented Jan 23, 2025 •

edited

Loading

[FLINK-37120][pipeline-connector/mysql] Add ending split chunk first to avoid TaskManager oom #3856

Are you sure you want to change the base?

[FLINK-37120][pipeline-connector/mysql] Add ending split chunk first to avoid TaskManager oom #3856

Conversation

beryllw commented Jan 14, 2025

beryllw commented Jan 15, 2025 • edited Loading

beryllw commented Jan 15, 2025

leonardBang commented Jan 15, 2025

beryllw commented Jan 15, 2025

lvyanquan Jan 16, 2025

Choose a reason for hiding this comment

lvyanquan commented Jan 16, 2025

beryllw commented Jan 16, 2025

beryllw commented Jan 17, 2025

lvyanquan commented Jan 17, 2025 • edited Loading

lvyanquan commented Jan 19, 2025

beryllw commented Jan 20, 2025

lvyanquan commented Jan 22, 2025 • edited Loading

beryllw commented Jan 23, 2025 • edited Loading

beryllw commented Jan 15, 2025 •

edited

Loading

lvyanquan commented Jan 17, 2025 •

edited

Loading

lvyanquan commented Jan 22, 2025 •

edited

Loading

beryllw commented Jan 23, 2025 •

edited

Loading