-
Notifications
You must be signed in to change notification settings - Fork 1.2k
[core] Support convert TopN to limit for primary-key table in deletion-vector mode #6193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[core] Support convert TopN to limit for primary-key table in deletion-vector mode #6193
Conversation
569e3e7
to
5199f55
Compare
I remember that when DV is turned on, the returns are out of order, not sorted by primary key. |
Why is it out of order? The mainly idea on this PR is to convert the TopN primary keys predicate into limit predicate when reading a single DataFile, then the compute engine (e.g. Apache Spark) will do the Global TopN. |
I see, for single data file, it is true. |
Also, can you add benchmark for this? |
|
paimon-core/src/main/java/org/apache/paimon/utils/FormatReaderMapping.java
Outdated
Show resolved
Hide resolved
if (isNullOrEmpty(orders)) { | ||
return Optional.empty(); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you check SortDirection
s in orders should be the same first?
paimon-core/src/main/java/org/apache/paimon/utils/FormatReaderMapping.java
Outdated
Show resolved
Hide resolved
} | ||
|
||
SortDirection firstDirection = null; | ||
for (int i = 0; i < fields.size(); i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get this loop, just check first field is OK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This loop is using primary keys to match sort keys one by one, if all primary keys match, ignore the non-primary keys, convert TopN to limit.
e.g. trimmed primary keys: c1, c2. non-primary keys: n1
position | c1 | c2 | n1 |
---|---|---|---|
0 | 10 | 100 | 2 |
1 | 10 | 200 | 1 |
2 | 20 | 100 | 2 |
3 | 20 | 200 | 1 |
-- return position [0]
ORDER BY c1 LIMIT 1
ORDER BY c1, c2 LIMIT 1
-- return position [3]
ORDER BY c1 DESC LIMIT 1
ORDER BY c1 DESC, c2 DESC LIMIT 1
-- non-primary key `n1` can be ignored, if all the primary keys matches (because primary keys is unique)
-- return position [0]
ORDER BY c1, c2, n1 ASC LIMIT 1
ORDER BY c1, c2, n1 DESC LIMIT 1
-- return position [3]
ORDER BY c1 DESC, c2 DESC, n1 ASC LIMIT 1
ORDER BY c1 DESC, c2 DESC, n1 DESC LIMIT 1
-- this following example will not convert TopN to limit.
-- not match, sort direction not same as the primary keys's
ORDER BY c1, c2 DESC LIMIT 1
ORDER BY c1 DESC, c2 LIMIT 1
-- not match, if non-primary key including, all the trimmed primary keys must matches in order
ORDER BY c1, n1 LIMIT 1
paimon-core/src/main/java/org/apache/paimon/utils/FormatReaderMapping.java
Outdated
Show resolved
Hide resolved
paimon-core/src/main/java/org/apache/paimon/utils/FormatReaderMapping.java
Outdated
Show resolved
Hide resolved
@JingsongLi Very thanks for your feedback! Sorry that I'm busy these days. I will update next week. |
Sure, please feel free to update according to your schedule. |
Purpose
The primary-keys is NOT NULL, unique and natural sort by ascending, when deletion-vector enabled, the following rules can let TopN convert to limit.
e.g. partition keys: pt. trimmed primary keys: c1, c2. non-primary keys: n1
TopN without partition keys:
TopN with partition keys:
Tests
API and Format
Documentation