-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Allow repartitioning on files with ranges #18948
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
93dd62c to
143f685
Compare
3e49087 to
691254c
Compare
|
I rebased on main. Not sure why cargo fmt is failing on a file that I did not touch! |
Unrelated error, fixed on main now so I've merged up from main |
|
Thank you! Do I need to tag more people to review this PR? or let it be for a few days? |
|
I plan to review this PR soon (hopefully today or tomorrow) |
Jefffrey
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes look good to me. Only thing to comment on is perhaps adjust the test cases so the ranges don't span the entire file; for example, pfile("a", 123).with_range(1, 122) just as a quick sanity check that we respect ranges that don't essentially alias an entire file.
Fixes apache#18940 When any partitioned file had ranges selected (even if the range contained the whole file), repartitioning was being skipped. There was no good reason to disallow this. I have now allowed this and change the implementation to respect file ranges. Added a couple of unit tests too.
Added support for repartitioning files with ranges when ordering needs to be preserved. Introduced two new methods on PartitionedFile to do this: - `effective_size()`: size of the file to be scanned. Takes into account range (if present) - `range()`: effective range of the file to be scanned
When the order needs to be preserved, the end ranges were not being set correctly. Leading to empty file ranges like (20, 20) instead of (20, 40). Found this bug while adding tests with less than complete ranges. Thank you Jeffrey Vo for this suggestion!
255c509 to
5ffe118
Compare
|
Thanks for pointing that out @Jefffrey! There was a bug in the order-preserved case that lead to some partitioned files being empty when the start of the range was non-zero. I have fixed the bug and added tests in the latest commit. |
Jefffrey
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll leave this PR open for a few days before merging in case anyone else is interested in reviewing it
Which issue does this PR close?
ExecutionPlan::repartitioneddoesn't work when file groups have ranges #18940Rationale for this change
When any partitioned file had ranges selected (even if the range contained the whole file), repartitioning was being skipped.
There was no good reason to disallow this.
What changes are included in this PR?
statetuple andstate.0/state.1with proper variable names using destructingAre these changes tested?
Added a couple of unit tests. Let me know if more are needed.
Are there any user-facing changes?