Conversation
There was a problem hiding this comment.
This PR is being reviewed by Cursor Bugbot
Details
Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.
To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.
da5768f to
4d8d7a9
Compare
4d8d7a9 to
b541401
Compare
plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiPageSourceProvider.java
Outdated
Show resolved
Hide resolved
plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiPageSource.java
Show resolved
Hide resolved
| import static io.trino.plugin.hudi.HudiErrorCode.HUDI_SCHEMA_ERROR; | ||
| import static io.trino.plugin.hudi.HudiErrorCode.HUDI_UNSUPPORTED_FILE_FORMAT; | ||
| import static java.lang.Math.toIntExact; | ||
| import static org.apache.hudi.common.model.HoodieRecord.PARTITION_PATH_METADATA_FIELD; |
There was a problem hiding this comment.
This requires KryoSerializer, i.e. kryo.
During compilation, our code does not directly use kryo and it will throw the following error if pom (in trino-hudi) includes it:
[INFO] --- dependency:3.8.1:analyze-only (default) @ trino-hudi ---
Error: Unused declared dependencies found:
Error: com.esotericsoftware:kryo:jar:4.0.2:compile
Gonna change it's scope to provided to see if we can get around this transitive dependency issue as kryo:
- Is needed for compilation (so the compiler can see KryoSerializable)
- It will be provided at runtime by another dependency (hudi-common i think)
Doing this in an attempt so that it won't be flagged as "unused" (provided scope is exempt from that check)
There was a problem hiding this comment.
Error that is thrown after adding provided scope:
Trino plugin dependency com.esotericsoftware:kryo must not have scope 'provided'. It is not part of the SPI and will not be available at runtime.
There was a problem hiding this comment.
Okay, changing the scope to runtime instead.
Error after changing to runtime:
Error: /home/runner/work/trino/trino/plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiUtil.java:[97,1] cannot access com.esotericsoftware.kryo.KryoSerializable
class file for com.esotericsoftware.kryo.KryoSerializable not found
There was a problem hiding this comment.
Forget it, will just revert, add a comment and add it into ignoredUnusedDeclaredDependencies.
276b592 to
3ca6640
Compare
| .addCopies(new FileOperationUtils.FileOperation("Input.readTail", DATA), 2) | ||
| .addCopies(new FileOperationUtils.FileOperation("InputFile.lastModified", METADATA_TABLE), 4) | ||
| .addCopies(new FileOperationUtils.FileOperation("InputFile.length", METADATA_TABLE), 4) | ||
| .addCopies(new FileOperationUtils.FileOperation("InputFile.lastModified", METADATA_TABLE), 5) |
There was a problem hiding this comment.
2025-12-01T06:07:46.434-0600 INFO ForkJoinPool-1-worker-1 stdout === All File Paths Accessed ===
2025-12-01T06:07:46.435-0600 INFO ForkJoinPool-1-worker-1 stdout InputFile.length: local:/tests_ce36c7c6-8f49-46b6-9b9c-10717eb89786/hudi_multi_fg_pt_v8_mor/.hoodie/metadata/files/files-0000-0_12-104-302_20250429145946737.hfile
2025-12-01T06:07:46.435-0600 INFO ForkJoinPool-1-worker-1 stdout InputFile.lastModified: local:/tests_ce36c7c6-8f49-46b6-9b9c-10717eb89786/hudi_multi_fg_pt_v8_mor/.hoodie/metadata/files/files-0000-0_12-104-302_20250429145946737.hfile
2025-12-01T06:07:46.435-0600 INFO ForkJoinPool-1-worker-1 stdout InputFile.length: local:/tests_ce36c7c6-8f49-46b6-9b9c-10717eb89786/hudi_multi_fg_pt_v8_mor/.hoodie/metadata/files/files-0000-0_12-104-302_20250429145946737.hfile
2025-12-01T06:07:46.435-0600 INFO ForkJoinPool-1-worker-1 stdout InputFile.lastModified: local:/tests_ce36c7c6-8f49-46b6-9b9c-10717eb89786/hudi_multi_fg_pt_v8_mor/.hoodie/metadata/files/files-0000-0_12-104-302_20250429145946737.hfile
2025-12-01T06:07:46.435-0600 INFO ForkJoinPool-1-worker-1 stdout InputFile.length: local:/tests_ce36c7c6-8f49-46b6-9b9c-10717eb89786/hudi_multi_fg_pt_v8_mor/.hoodie/metadata/files/files-0000-0_12-104-302_20250429145946737.hfile
2025-12-01T06:07:46.435-0600 INFO ForkJoinPool-1-worker-1 stdout InputFile.lastModified: local:/tests_ce36c7c6-8f49-46b6-9b9c-10717eb89786/hudi_multi_fg_pt_v8_mor/.hoodie/metadata/files/files-0000-0_12-104-302_20250429145946737.hfile
2025-12-01T06:07:46.435-0600 INFO ForkJoinPool-1-worker-1 stdout InputFile.length: local:/tests_ce36c7c6-8f49-46b6-9b9c-10717eb89786/hudi_multi_fg_pt_v8_mor/.hoodie/metadata/files/files-0000-0_12-104-302_20250429145946737.hfile
2025-12-01T06:07:46.436-0600 INFO ForkJoinPool-1-worker-1 stdout InputFile.lastModified: local:/tests_ce36c7c6-8f49-46b6-9b9c-10717eb89786/hudi_multi_fg_pt_v8_mor/.hoodie/metadata/files/files-0000-0_12-104-302_20250429145946737.hfile
2025-12-01T06:07:46.436-0600 INFO ForkJoinPool-1-worker-1 stdout InputFile.length: local:/tests_ce36c7c6-8f49-46b6-9b9c-10717eb89786/hudi_multi_fg_pt_v8_mor/.hoodie/metadata/files/files-0000-0_12-104-302_20250429145946737.hfile
2025-12-01T06:07:46.436-0600 INFO ForkJoinPool-1-worker-1 stdout InputFile.lastModified: local:/tests_ce36c7c6-8f49-46b6-9b9c-10717eb89786/hudi_multi_fg_pt_v8_mor/.hoodie/metadata/files/files-0000-0_12-104-302_20250429145946737.hfile
2025-12-01T06:07:46.436-0600 INFO ForkJoinPool-1-worker-1 stdout
=== Actual Cache Accesses ===
2025-12-01T06:07:46.439-0600 INFO ForkJoinPool-1-worker-1 stdout FileOperation[operationType=Input.readTail, fileType=DATA]: 2
2025-12-01T06:07:46.439-0600 INFO ForkJoinPool-1-worker-1 stdout FileOperation[operationType=InputFile.lastModified, fileType=METADATA_TABLE]: 5
2025-12-01T06:07:46.439-0600 INFO ForkJoinPool-1-worker-1 stdout FileOperation[operationType=InputFile.length, fileType=METADATA_TABLE]: 5
2025-12-01T06:07:46.439-0600 INFO ForkJoinPool-1-worker-1 stdout FileOperation[operationType=InputFile.newStream, fileType=INDEX_DEFINITION]: 4
2025-12-01T06:07:46.439-0600 INFO ForkJoinPool-1-worker-1 stdout FileOperation[operationType=InputFile.newStream, fileType=METADATA_TABLE]: 6
2025-12-01T06:07:46.439-0600 INFO ForkJoinPool-1-worker-1 stdout FileOperation[operationType=InputFile.newStream, fileType=METADATA_TABLE_PROPERTIES]: 1
2025-12-01T06:07:46.439-0600 INFO ForkJoinPool-1-worker-1 stdout FileOperation[operationType=InputFile.newStream, fileType=TABLE_PROPERTIES]: 4
2025-12-01T06:07:46.439-0600 INFO ForkJoinPool-1-worker-1 stdout
=== Expected Cache Accesses ===
2025-12-01T06:07:46.441-0600 INFO ForkJoinPool-1-worker-1 stdout FileOperation[operationType=Input.readTail, fileType=DATA]: 2
2025-12-01T06:07:46.441-0600 INFO ForkJoinPool-1-worker-1 stdout FileOperation[operationType=InputFile.lastModified, fileType=METADATA_TABLE]: 4
2025-12-01T06:07:46.441-0600 INFO ForkJoinPool-1-worker-1 stdout FileOperation[operationType=InputFile.length, fileType=METADATA_TABLE]: 4
2025-12-01T06:07:46.441-0600 INFO ForkJoinPool-1-worker-1 stdout FileOperation[operationType=InputFile.newStream, fileType=INDEX_DEFINITION]: 4
2025-12-01T06:07:46.441-0600 INFO ForkJoinPool-1-worker-1 stdout FileOperation[operationType=InputFile.newStream, fileType=METADATA_TABLE]: 6
2025-12-01T06:07:46.441-0600 INFO ForkJoinPool-1-worker-1 stdout FileOperation[operationType=InputFile.newStream, fileType=METADATA_TABLE_PROPERTIES]: 1
2025-12-01T06:07:46.442-0600 INFO ForkJoinPool-1-worker-1 stdout FileOperation[operationType=InputFile.newStream, fileType=TABLE_PROPERTIES]: 4
| .addCopies(new FileOperationUtils.FileOperation("InputFile.lastModified", METADATA_TABLE), 4) | ||
| .addCopies(new FileOperationUtils.FileOperation("InputFile.length", METADATA_TABLE), 4) | ||
| .addCopies(new FileOperationUtils.FileOperation("InputFile.lastModified", METADATA_TABLE), 5) | ||
| .addCopies(new FileOperationUtils.FileOperation("InputFile.length", METADATA_TABLE), 5) |
There was a problem hiding this comment.
2025-12-01T06:11:05.252-0600 INFO ForkJoinPool-1-worker-1 stdout === All File Paths Accessed ===
2025-12-01T06:11:05.252-0600 INFO ForkJoinPool-1-worker-1 stdout InputFile.length: local:/tests_5edea368-419e-415c-b30d-f5f1bedac2f0/hudi_multi_fg_pt_v8_mor/.hoodie/metadata/files/files-0000-0_12-104-302_20250429145946737.hfile
2025-12-01T06:11:05.253-0600 INFO ForkJoinPool-1-worker-1 stdout InputFile.lastModified: local:/tests_5edea368-419e-415c-b30d-f5f1bedac2f0/hudi_multi_fg_pt_v8_mor/.hoodie/metadata/files/files-0000-0_12-104-302_20250429145946737.hfile
2025-12-01T06:11:05.253-0600 INFO ForkJoinPool-1-worker-1 stdout InputFile.length: local:/tests_5edea368-419e-415c-b30d-f5f1bedac2f0/hudi_multi_fg_pt_v8_mor/.hoodie/metadata/files/files-0000-0_12-104-302_20250429145946737.hfile
2025-12-01T06:11:05.253-0600 INFO ForkJoinPool-1-worker-1 stdout InputFile.lastModified: local:/tests_5edea368-419e-415c-b30d-f5f1bedac2f0/hudi_multi_fg_pt_v8_mor/.hoodie/metadata/files/files-0000-0_12-104-302_20250429145946737.hfile
2025-12-01T06:11:05.253-0600 INFO ForkJoinPool-1-worker-1 stdout InputFile.length: local:/tests_5edea368-419e-415c-b30d-f5f1bedac2f0/hudi_multi_fg_pt_v8_mor/.hoodie/metadata/files/files-0000-0_12-104-302_20250429145946737.hfile
2025-12-01T06:11:05.253-0600 INFO ForkJoinPool-1-worker-1 stdout InputFile.lastModified: local:/tests_5edea368-419e-415c-b30d-f5f1bedac2f0/hudi_multi_fg_pt_v8_mor/.hoodie/metadata/files/files-0000-0_12-104-302_20250429145946737.hfile
2025-12-01T06:11:05.253-0600 INFO ForkJoinPool-1-worker-1 stdout InputFile.length: local:/tests_5edea368-419e-415c-b30d-f5f1bedac2f0/hudi_multi_fg_pt_v8_mor/.hoodie/metadata/files/files-0000-0_12-104-302_20250429145946737.hfile
2025-12-01T06:11:05.253-0600 INFO ForkJoinPool-1-worker-1 stdout InputFile.lastModified: local:/tests_5edea368-419e-415c-b30d-f5f1bedac2f0/hudi_multi_fg_pt_v8_mor/.hoodie/metadata/files/files-0000-0_12-104-302_20250429145946737.hfile
2025-12-01T06:11:05.253-0600 INFO ForkJoinPool-1-worker-1 stdout InputFile.length: local:/tests_5edea368-419e-415c-b30d-f5f1bedac2f0/hudi_multi_fg_pt_v8_mor/.hoodie/metadata/files/files-0000-0_12-104-302_20250429145946737.hfile
2025-12-01T06:11:05.253-0600 INFO ForkJoinPool-1-worker-1 stdout InputFile.lastModified: local:/tests_5edea368-419e-415c-b30d-f5f1bedac2f0/hudi_multi_fg_pt_v8_mor/.hoodie/metadata/files/files-0000-0_12-104-302_20250429145946737.hfile
2025-12-01T06:11:05.253-0600 INFO ForkJoinPool-1-worker-1 stdout
=== Actual Cache Accesses ===
2025-12-01T06:11:05.253-0600 INFO ForkJoinPool-1-worker-1 stdout FileOperation[operationType=Input.readTail, fileType=DATA]: 2
2025-12-01T06:11:05.253-0600 INFO ForkJoinPool-1-worker-1 stdout FileOperation[operationType=InputFile.lastModified, fileType=METADATA_TABLE]: 5
2025-12-01T06:11:05.253-0600 INFO ForkJoinPool-1-worker-1 stdout FileOperation[operationType=InputFile.length, fileType=METADATA_TABLE]: 5
2025-12-01T06:11:05.253-0600 INFO ForkJoinPool-1-worker-1 stdout FileOperation[operationType=InputFile.newStream, fileType=INDEX_DEFINITION]: 4
2025-12-01T06:11:05.253-0600 INFO ForkJoinPool-1-worker-1 stdout FileOperation[operationType=InputFile.newStream, fileType=METADATA_TABLE]: 6
2025-12-01T06:11:05.253-0600 INFO ForkJoinPool-1-worker-1 stdout FileOperation[operationType=InputFile.newStream, fileType=METADATA_TABLE_PROPERTIES]: 1
2025-12-01T06:11:05.253-0600 INFO ForkJoinPool-1-worker-1 stdout FileOperation[operationType=InputFile.newStream, fileType=TABLE_PROPERTIES]: 4
2025-12-01T06:11:05.253-0600 INFO ForkJoinPool-1-worker-1 stdout
=== Expected Cache Accesses ===
2025-12-01T06:11:05.254-0600 INFO ForkJoinPool-1-worker-1 stdout FileOperation[operationType=Input.readTail, fileType=DATA]: 2
2025-12-01T06:11:05.254-0600 INFO ForkJoinPool-1-worker-1 stdout FileOperation[operationType=InputFile.lastModified, fileType=METADATA_TABLE]: 4
2025-12-01T06:11:05.254-0600 INFO ForkJoinPool-1-worker-1 stdout FileOperation[operationType=InputFile.length, fileType=METADATA_TABLE]: 4
2025-12-01T06:11:05.254-0600 INFO ForkJoinPool-1-worker-1 stdout FileOperation[operationType=InputFile.newStream, fileType=INDEX_DEFINITION]: 4
2025-12-01T06:11:05.254-0600 INFO ForkJoinPool-1-worker-1 stdout FileOperation[operationType=InputFile.newStream, fileType=METADATA_TABLE]: 6
2025-12-01T06:11:05.254-0600 INFO ForkJoinPool-1-worker-1 stdout FileOperation[operationType=InputFile.newStream, fileType=METADATA_TABLE_PROPERTIES]: 1
2025-12-01T06:11:05.254-0600 INFO ForkJoinPool-1-worker-1 stdout FileOperation[operationType=InputFile.newStream, fileType=TABLE_PROPERTIES]: 4
plugin/trino-hudi/src/test/java/io/trino/plugin/hudi/TestHudiNoCacheFileOperations.java
Outdated
Show resolved
Hide resolved
c5d75f0 to
3edb81c
Compare
plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiPageSource.java
Show resolved
Hide resolved
dc28426 to
34ae645
Compare
34ae645 to
f8ca84a
Compare
Description
Bumping Hudi to version 1.1.0. This PR replaces #69 as i do not have permissions to write directly to OH Trino on master.
This PR is co-authored by @vamsikarnika.
Additional context and related issues
Release notes
( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:
Note
Upgrades to Hudi 1.1.0 and updates the Hudi connector to new APIs (reader, metadata, indexes), adding ordering-column support, iterator-based reading, and refreshed tests.
org.apache.hudito1.1.0; usehudi-iowithshadedclassifier.HoodieFileGroupReaderbuilder andClosableIteratorflow inHudiPageSource; robust resource closing.HudiTrinoReaderContextto new reader context/merger APIs; support path- and pathInfo-based iterators.InlineSeekableDataInputStream(seek-on-init, relative pos/seek) with unit tests.HudiTrinoFileReaderFactoryto new HFile reader factory/builders.HoodieBackedTableMetadata/FileSystemBackedTableMetadataselection; replace older factory call.org.apache.hudi.stats.*; addgetColumnsRangeand adjust table statistics reader.orderingColumnstoHudiTableHandle; expose via JSON; derive viagetOrderingColumnHandles.HudiSplitSource: adapt metadata table creation; unchanged DF logic.ColumnStats,PartitionStats,RecordLevel,Secondary) updated to new MDT APIs and key classes; use list data wrappers and map collectors.FileOperationAssertions; update expected counts.HudiTrinoRecord.Written by Cursor Bugbot for commit f8ca84a. This will update automatically on new commits. Configure here.