-
Notifications
You must be signed in to change notification settings - Fork 191
Delta Kernel Draft PR #729
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
xtable-core/src/main/java/org/apache/xtable/delta/DeltaKernelConversionSourceProvider.java
Outdated
Show resolved
Hide resolved
xtable-core/src/main/java/org/apache/xtable/delta/DeltaKernelSchemaExtractor.java
Outdated
Show resolved
Hide resolved
xtable-core/src/main/java/org/apache/xtable/delta/DeltaKernelTableExtractor.java
Outdated
Show resolved
Hide resolved
xtable-core/src/main/java/org/apache/xtable/delta/DeltaSchemaExtractor.java
Show resolved
Hide resolved
xtable-core/src/main/java/org/apache/xtable/kernel/DeltaKernelConversionSource.java
Show resolved
Hide resolved
xtable-core/src/main/java/org/apache/xtable/delta/DeltaKernelDataFileExtractor.java
Outdated
Show resolved
Hide resolved
xtable-core/src/main/java/org/apache/xtable/delta/DeltaKernelDataFileExtractor.java
Outdated
Show resolved
Hide resolved
vinishjail97
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great progress @vaibhavk1992, added some comments.
xtable-core/src/main/java/org/apache/xtable/delta/DeltaKernelActionsConverter.java
Outdated
Show resolved
Hide resolved
|
@the-other-tim-brown @vinishjail97 @rahil-c All the comments have been addressed and the build is passing too. |
xtable-core/src/main/java/org/apache/xtable/kernel/DeltaKernelActionsConverter.java
Outdated
Show resolved
Hide resolved
xtable-core/src/main/java/org/apache/xtable/kernel/DeltaKernelPartitionExtractor.java
Outdated
Show resolved
Hide resolved
xtable-core/src/test/java/org/apache/xtable/kernel/TestDeltaKernelStatsExtractor.java
Outdated
Show resolved
Hide resolved
xtable-core/src/test/java/org/apache/xtable/DeltaTableKernel.java
Outdated
Show resolved
Hide resolved
xtable-core/src/test/java/org/apache/xtable/kernel/ITDeltaKernelConversionSource.java
Outdated
Show resolved
Hide resolved
xtable-core/src/main/java/org/apache/xtable/kernel/DeltaKernelPartitionExtractor.java
Outdated
Show resolved
Hide resolved
xtable-core/src/test/java/org/apache/xtable/kernel/TestDeltaKernelPartitionExtractor.java
Outdated
Show resolved
Hide resolved
xtable-core/src/test/java/org/apache/xtable/kernel/TestDeltaKernelSchemaExtractor.java
Outdated
Show resolved
Hide resolved
xtable-core/src/main/java/org/apache/xtable/kernel/DeltaKernelTableExtractor.java
Outdated
Show resolved
Hide resolved
xtable-core/src/main/java/org/apache/xtable/kernel/DeltaKernelTableExtractor.java
Outdated
Show resolved
Hide resolved
xtable-core/src/main/java/org/apache/xtable/kernel/DeltaKernelTableExtractor.java
Outdated
Show resolved
Hide resolved
xtable-core/src/main/java/org/apache/xtable/kernel/DeltaKernelTableExtractor.java
Outdated
Show resolved
Hide resolved
xtable-core/src/main/java/org/apache/xtable/kernel/DeltaKernelTableExtractor.java
Outdated
Show resolved
Hide resolved
| myScan.getScanFiles(engine, includeColumnStats); | ||
|
|
||
| List<InternalDataFile> dataFiles = new ArrayList<>(); | ||
| while (scanFiles.hasNext()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is eagerly pulling all the files into memory instead of using the iterator pattern. Can the code be updated to iterate through these values when next is called instead of eagerly materializing them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@the-other-tim-brown
the scanFiles.hasNext() Returns true if the iteration has more elements. (In other words, returns true if next would return an element rather than throwing an exception.). I am not sure what do you mean by materializing here and what we have to do exactly.
I have followed this convention from the suggestion by the doc form kernel community. Please check this.
https://github.com/delta-io/delta/blob/master/kernel/USER_GUIDE.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is that the DeltaDataFileIterator is an iterator so we don't need to compute all the values in the constructor. We can instead compute the next value when next is called.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the code, please check if that make sense. @the-other-tim-brown
xtable-core/src/main/java/org/apache/xtable/kernel/DeltaKernelIncrementalChangesState.java
Outdated
Show resolved
Hide resolved
xtable-core/src/main/java/org/apache/xtable/kernel/DeltaKernelIncrementalChangesState.java
Outdated
Show resolved
Hide resolved
xtable-core/src/main/java/org/apache/xtable/kernel/DeltaKernelIncrementalChangesState.java
Outdated
Show resolved
Hide resolved
xtable-core/src/main/java/org/apache/xtable/kernel/DeltaKernelPartitionExtractor.java
Outdated
Show resolved
Hide resolved
xtable-core/src/main/java/org/apache/xtable/kernel/DeltaKernelConversionSource.java
Outdated
Show resolved
Hide resolved
xtable-core/src/main/java/org/apache/xtable/kernel/DeltaKernelTableExtractor.java
Outdated
Show resolved
Hide resolved
|
@vaibhavk1992 make sure to fill out the PR template |
xtable-core/src/main/java/org/apache/xtable/kernel/DeltaKernelActionsConverter.java
Outdated
Show resolved
Hide resolved
xtable-core/src/main/java/org/apache/xtable/kernel/DeltaKernelPartitionExtractor.java
Outdated
Show resolved
Hide resolved
|
@vaibhavk1992 Thanks for pushing on this, I took a brief scan and the PR looks close to landing. My main question is around testing, did you happen to test conversion end to end using your delta kernel source to hudi or to iceberg? I believe the IT class I am wondering if we should try to add some similar IT flow as the above class to confirm Delta Kernel works correctly end to end with other target formats or at the very least wanted to know if you did some manual testing to confirm you can convert to an iceberg or hudi table using your delta kernel source? I think we can also add this end to end IT test plan as a fast follow in a separate pr. |
|
@rahil-c As discussed |
Important Read
What is the purpose of the pull request
This draft introduces the Delta Kernel to read the tables instead of using spark sessions.
Brief change log
Verify this pull request
This pull request is a trivial rework , added unit tests, integration test to verify all the edge cases
-Unit Tests
-TestDeltaKernelSchemaExtractor
-TestDeltaKernelPartitionExtractor
-TestDeltaKernelStatsExtractor
Tested statistics extraction
Validated range and value statistics
Tested null count handling
Added integration tests for end-to-end.