Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-35912] SqlServer CDC doesn't chunk UUID-typed columns correctly #3497

Merged
merged 2 commits into from
Aug 6, 2024

Conversation

LiPL2017
Copy link
Contributor

A majority of our Sqlserver databases use uniqueidentifier as primary keys.
When we enable 'scan.incremental.snapshot.enabled = true', flink cdc will try to split into chunks.
The splitTableIntoChunks function relies on the queryMinMax function, which fails when trying to calculate the MIN(UUID) and MAX(UUID). The issue arises because the uniqueidentifier type in SQL Server does not support the traditional MIN and MAX functions in a mathematical sense, as it does not follow a numeric minimum or maximum value.

Copy link
Contributor

@yuxiqian yuxiqian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for @LiPL2017's contribution! Could you please also create a JIRA ticket here to track this issue?

Comment on lines 72 to 87
private static String reverse(String text) {
String[] arrs = text.split("-");
List<String> arrList = Arrays.asList(arrs);
Collections.reverse(arrList);
arrs = arrList.toArray(new String[0]);
return String.join("-", arrs);
}

public static int compare(Object obj1, Object obj2, Column splitColumn) {
if (splitColumn.typeName().equals(UNIQUEIDENTIFIRER)) {
String sObj1 = reverse(obj1.toString());
String sOjb2 = reverse(obj2.toString());
return ObjectUtils.compare(sObj1, sOjb2);
}
return ObjectUtils.compare(obj1, obj2);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it be better if we rename them to reverseUuid / compareUuid, and move these utility functions to org.apache.flink.cdc.connectors.sqlserver.source.utils package?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice if we can add some test cases to verify UUID comparison logic. Test cases from Microsoft docs would be ideal.

Comment on lines 212 to 213
"No maximum LSN recorded in the database; please ensure that the SQL Server Agent is running");
"{} No maximum LSN recorded in the database; please ensure that the SQL Server Agent is running",
dataConnection.connectionString());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change related to this PR?

@LiPL2017
Copy link
Contributor Author

Thanks for @LiPL2017's contribution! Could you please also create a JIRA ticket here to track this issue?

Sorry, I haven't created a JIRA account yet. I have already adopted all the other suggestions. Please review the code again.

@yuxiqian
Copy link
Contributor

yuxiqian commented Jul 29, 2024

Created FLINK-35912 to trace this issue. Would be nice if you could rename this PR as [FLINK-35912] ... to link it with this ticket :)

@LiPL2017 LiPL2017 changed the title Fix sqlserver uniqueidentifier column as PK SqlServer CDC doesn't chunk UUID-typed columns correctly Jul 29, 2024
Copy link
Contributor

@yuxiqian yuxiqian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for @LiPL2017's quick response! Just left some minor comments on test cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testutils are some utility functions for testing, while this case is actually testing sqlserver.source.utils. Putting it in test/java/org/apache/flink/cdc/connectors/sqlserver/source/utils package makes more sense.

public class SqlserverCompareUuidTest {
private static final Logger LOG = LoggerFactory.getLogger(SqlserverCompareUuidTest.class);
@Test
public void testWorkWithGuids() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use assertEquals to verify UUID sorting results instead of logging them for manual checks.

@LiPL2017 LiPL2017 changed the title SqlServer CDC doesn't chunk UUID-typed columns correctly [FLINK-35912] SqlServer CDC doesn't chunk UUID-typed columns correctly Jul 29, 2024
Copy link
Contributor

@yuxiqian yuxiqian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Except you need to run mvn spotless:apply to fix code style violations.

@LiPL2017
Copy link
Contributor Author

Looks good! Except you need to run mvn spotless:apply to fix code style violations.

Are there any other adjustments needed besides these?

@yuxiqian
Copy link
Contributor

Could @GOODBOY008 please help driving this to be merged?

@GOODBOY008 GOODBOY008 self-requested a review August 1, 2024 02:57
@GOODBOY008
Copy link
Member

@LiPL2017 Can you rebase to master?

@@ -0,0 +1,53 @@
package org.apache.flink.cdc.connectors.sqlserver.source.utils;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add license header.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add license header.

It has been added

@GOODBOY008
Copy link
Member

GOODBOY008 commented Aug 2, 2024

@LiPL2017 I make some change about uuid compare method by refering to MS source code. Can you doule check ? And should rebase to master to resolve conficts.

@LiPL2017
Copy link
Contributor Author

LiPL2017 commented Aug 5, 2024

@GOODBOY008 Could you please review the code again?

Copy link
Member

@GOODBOY008 GOODBOY008 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, When #3508 merged, Please rebase master to make ci green.

@GOODBOY008 GOODBOY008 merged commit f6d1d48 into apache:master Aug 6, 2024
13 of 20 checks passed
qiaozongmi pushed a commit to qiaozongmi/flink-cdc that referenced this pull request Sep 23, 2024
…olumns correctly (apache#3497)

* resolve conficts

* polish code to trigger ci

---------

Co-authored-by: Kael <kael@fts.dev>
Co-authored-by: gongzhongqiang <gongzhongqiang@gigacloudtech.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants