Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Spark] Make
delta.dataSkippingStatsColumns
more lenient for nested columns #2850base: master
Are you sure you want to change the base?
[Spark] Make
delta.dataSkippingStatsColumns
more lenient for nested columns #2850Changes from 9 commits
a9412bc
a864205
975f20b
6411e7a
e68028d
cbb0447
3759afd
329a571
b911d6a
0ea2fc8
81c60aa
6068eb0
60978a8
bb86d96
0c24d44
9bd8159
86c148f
25e6faa
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we consider double struct test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a double nested struct as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we have hit tests for elements inside struct and the double struct as well?
Btw click on re-request review so that I get notified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added more hit tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My understanding is that when we are inside struct, we gather stats for the valid datatypes inside the struct, but the invalid ones we still don't right? Would we want to test that we still don't skip the invalid datatypes inside the struct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if we can check another skipping valid type beside integer, like timestamp, string, ... that would be great
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically null counts are collected, but they can't be used for skipping. We could add that to the test, but seems kind of out of scope because you can't do min/max things with arrays/maps anyway, so it'd have to be like an element contains or something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are other non-skipping eligible datatypes than lists datatypes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how to specify a Binary type in the inferred JSON. Is that the only non-complex type?