Skip to content

Upgrade hudi version 1.1#74

Open
voonhous wants to merge 6 commits intoonehouseinc:masterfrom
voonhous:upgrade_hudi_version_1.1
Open

Upgrade hudi version 1.1#74
voonhous wants to merge 6 commits intoonehouseinc:masterfrom
voonhous:upgrade_hudi_version_1.1

Conversation

@voonhous
Copy link

@voonhous voonhous commented Nov 30, 2025

Description

Bumping Hudi to version 1.1.0. This PR replaces #69 as i do not have permissions to write directly to OH Trino on master.

This PR is co-authored by @vamsikarnika.

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

## Section
* Fix some things. ({issue}`issuenumber`)

Note

Upgrades to Hudi 1.1.0 and updates the Hudi connector to new APIs (reader, metadata, indexes), adding ordering-column support, iterator-based reading, and refreshed tests.

  • Dependencies
    • Bump org.apache.hudi to 1.1.0; use hudi-io with shaded classifier.
    • Maven tweaks: ignore protobuf resources and shaded Parquet class; track Kryo as ignored unused dep.
  • Reader/IO
    • Switch to HoodieFileGroupReader builder and ClosableIterator flow in HudiPageSource; robust resource closing.
    • Update HudiTrinoReaderContext to new reader context/merger APIs; support path- and pathInfo-based iterators.
    • Implement InlineSeekableDataInputStream (seek-on-init, relative pos/seek) with unit tests.
    • Update HudiTrinoFileReaderFactory to new HFile reader factory/builders.
  • Metadata & Stats
    • Use HoodieBackedTableMetadata/FileSystemBackedTableMetadata selection; replace older factory call.
    • Migrate column stats to org.apache.hudi.stats.*; add getColumnsRange and adjust table statistics reader.
  • Table handle/columns
    • Add lazy orderingColumns to HudiTableHandle; expose via JSON; derive via getOrderingColumnHandles.
    • Prepend Hudi meta + ordering columns when needed.
  • Split/Indexing
    • HudiSplitSource: adapt metadata table creation; unchanged DF logic.
    • Index supports (ColumnStats, PartitionStats, RecordLevel, Secondary) updated to new MDT APIs and key classes; use list data wrappers and map collectors.
  • Tests & Utils
    • Refactor cache FS operation assertions into FileOperationAssertions; update expected counts.
    • Update TPCH writer to new commit API; remove unused HudiTrinoRecord.

Written by Cursor Bugbot for commit f8ca84a. This will update automatically on new commits. Configure here.

@voonhous voonhous requested a review from a team as a code owner November 30, 2025 17:16
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is being reviewed by Cursor Bugbot

Details

Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

@voonhous voonhous force-pushed the upgrade_hudi_version_1.1 branch from da5768f to 4d8d7a9 Compare November 30, 2025 18:08
@voonhous voonhous force-pushed the upgrade_hudi_version_1.1 branch from 4d8d7a9 to b541401 Compare November 30, 2025 19:00
import static io.trino.plugin.hudi.HudiErrorCode.HUDI_SCHEMA_ERROR;
import static io.trino.plugin.hudi.HudiErrorCode.HUDI_UNSUPPORTED_FILE_FORMAT;
import static java.lang.Math.toIntExact;
import static org.apache.hudi.common.model.HoodieRecord.PARTITION_PATH_METADATA_FIELD;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requires KryoSerializer, i.e. kryo.

During compilation, our code does not directly use kryo and it will throw the following error if pom (in trino-hudi) includes it:

[INFO] --- dependency:3.8.1:analyze-only (default) @ trino-hudi ---
Error:  Unused declared dependencies found:
Error:     com.esotericsoftware:kryo:jar:4.0.2:compile

Gonna change it's scope to provided to see if we can get around this transitive dependency issue as kryo:

  1. Is needed for compilation (so the compiler can see KryoSerializable)
  2. It will be provided at runtime by another dependency (hudi-common i think)

Doing this in an attempt so that it won't be flagged as "unused" (provided scope is exempt from that check)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error that is thrown after adding provided scope:

Trino plugin dependency com.esotericsoftware:kryo must not have scope 'provided'. It is not part of the SPI and will not be available at runtime.

Copy link
Author

@voonhous voonhous Dec 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, changing the scope to runtime instead.

Error after changing to runtime:

Error:  /home/runner/work/trino/trino/plugin/trino-hudi/src/main/java/io/trino/plugin/hudi/HudiUtil.java:[97,1] cannot access com.esotericsoftware.kryo.KryoSerializable
  class file for com.esotericsoftware.kryo.KryoSerializable not found

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forget it, will just revert, add a comment and add it into ignoredUnusedDeclaredDependencies.

@voonhous voonhous force-pushed the upgrade_hudi_version_1.1 branch from 276b592 to 3ca6640 Compare December 1, 2025 11:02
.addCopies(new FileOperationUtils.FileOperation("Input.readTail", DATA), 2)
.addCopies(new FileOperationUtils.FileOperation("InputFile.lastModified", METADATA_TABLE), 4)
.addCopies(new FileOperationUtils.FileOperation("InputFile.length", METADATA_TABLE), 4)
.addCopies(new FileOperationUtils.FileOperation("InputFile.lastModified", METADATA_TABLE), 5)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2025-12-01T06:07:46.434-0600	INFO	ForkJoinPool-1-worker-1	stdout	=== All File Paths Accessed ===
2025-12-01T06:07:46.435-0600	INFO	ForkJoinPool-1-worker-1	stdout	InputFile.length: local:/tests_ce36c7c6-8f49-46b6-9b9c-10717eb89786/hudi_multi_fg_pt_v8_mor/.hoodie/metadata/files/files-0000-0_12-104-302_20250429145946737.hfile
2025-12-01T06:07:46.435-0600	INFO	ForkJoinPool-1-worker-1	stdout	InputFile.lastModified: local:/tests_ce36c7c6-8f49-46b6-9b9c-10717eb89786/hudi_multi_fg_pt_v8_mor/.hoodie/metadata/files/files-0000-0_12-104-302_20250429145946737.hfile
2025-12-01T06:07:46.435-0600	INFO	ForkJoinPool-1-worker-1	stdout	InputFile.length: local:/tests_ce36c7c6-8f49-46b6-9b9c-10717eb89786/hudi_multi_fg_pt_v8_mor/.hoodie/metadata/files/files-0000-0_12-104-302_20250429145946737.hfile
2025-12-01T06:07:46.435-0600	INFO	ForkJoinPool-1-worker-1	stdout	InputFile.lastModified: local:/tests_ce36c7c6-8f49-46b6-9b9c-10717eb89786/hudi_multi_fg_pt_v8_mor/.hoodie/metadata/files/files-0000-0_12-104-302_20250429145946737.hfile
2025-12-01T06:07:46.435-0600	INFO	ForkJoinPool-1-worker-1	stdout	InputFile.length: local:/tests_ce36c7c6-8f49-46b6-9b9c-10717eb89786/hudi_multi_fg_pt_v8_mor/.hoodie/metadata/files/files-0000-0_12-104-302_20250429145946737.hfile
2025-12-01T06:07:46.435-0600	INFO	ForkJoinPool-1-worker-1	stdout	InputFile.lastModified: local:/tests_ce36c7c6-8f49-46b6-9b9c-10717eb89786/hudi_multi_fg_pt_v8_mor/.hoodie/metadata/files/files-0000-0_12-104-302_20250429145946737.hfile
2025-12-01T06:07:46.435-0600	INFO	ForkJoinPool-1-worker-1	stdout	InputFile.length: local:/tests_ce36c7c6-8f49-46b6-9b9c-10717eb89786/hudi_multi_fg_pt_v8_mor/.hoodie/metadata/files/files-0000-0_12-104-302_20250429145946737.hfile
2025-12-01T06:07:46.436-0600	INFO	ForkJoinPool-1-worker-1	stdout	InputFile.lastModified: local:/tests_ce36c7c6-8f49-46b6-9b9c-10717eb89786/hudi_multi_fg_pt_v8_mor/.hoodie/metadata/files/files-0000-0_12-104-302_20250429145946737.hfile
2025-12-01T06:07:46.436-0600	INFO	ForkJoinPool-1-worker-1	stdout	InputFile.length: local:/tests_ce36c7c6-8f49-46b6-9b9c-10717eb89786/hudi_multi_fg_pt_v8_mor/.hoodie/metadata/files/files-0000-0_12-104-302_20250429145946737.hfile
2025-12-01T06:07:46.436-0600	INFO	ForkJoinPool-1-worker-1	stdout	InputFile.lastModified: local:/tests_ce36c7c6-8f49-46b6-9b9c-10717eb89786/hudi_multi_fg_pt_v8_mor/.hoodie/metadata/files/files-0000-0_12-104-302_20250429145946737.hfile
2025-12-01T06:07:46.436-0600	INFO	ForkJoinPool-1-worker-1	stdout	
=== Actual Cache Accesses ===
2025-12-01T06:07:46.439-0600	INFO	ForkJoinPool-1-worker-1	stdout	FileOperation[operationType=Input.readTail, fileType=DATA]: 2
2025-12-01T06:07:46.439-0600	INFO	ForkJoinPool-1-worker-1	stdout	FileOperation[operationType=InputFile.lastModified, fileType=METADATA_TABLE]: 5
2025-12-01T06:07:46.439-0600	INFO	ForkJoinPool-1-worker-1	stdout	FileOperation[operationType=InputFile.length, fileType=METADATA_TABLE]: 5
2025-12-01T06:07:46.439-0600	INFO	ForkJoinPool-1-worker-1	stdout	FileOperation[operationType=InputFile.newStream, fileType=INDEX_DEFINITION]: 4
2025-12-01T06:07:46.439-0600	INFO	ForkJoinPool-1-worker-1	stdout	FileOperation[operationType=InputFile.newStream, fileType=METADATA_TABLE]: 6
2025-12-01T06:07:46.439-0600	INFO	ForkJoinPool-1-worker-1	stdout	FileOperation[operationType=InputFile.newStream, fileType=METADATA_TABLE_PROPERTIES]: 1
2025-12-01T06:07:46.439-0600	INFO	ForkJoinPool-1-worker-1	stdout	FileOperation[operationType=InputFile.newStream, fileType=TABLE_PROPERTIES]: 4
2025-12-01T06:07:46.439-0600	INFO	ForkJoinPool-1-worker-1	stdout	
=== Expected Cache Accesses ===
2025-12-01T06:07:46.441-0600	INFO	ForkJoinPool-1-worker-1	stdout	FileOperation[operationType=Input.readTail, fileType=DATA]: 2
2025-12-01T06:07:46.441-0600	INFO	ForkJoinPool-1-worker-1	stdout	FileOperation[operationType=InputFile.lastModified, fileType=METADATA_TABLE]: 4
2025-12-01T06:07:46.441-0600	INFO	ForkJoinPool-1-worker-1	stdout	FileOperation[operationType=InputFile.length, fileType=METADATA_TABLE]: 4
2025-12-01T06:07:46.441-0600	INFO	ForkJoinPool-1-worker-1	stdout	FileOperation[operationType=InputFile.newStream, fileType=INDEX_DEFINITION]: 4
2025-12-01T06:07:46.441-0600	INFO	ForkJoinPool-1-worker-1	stdout	FileOperation[operationType=InputFile.newStream, fileType=METADATA_TABLE]: 6
2025-12-01T06:07:46.441-0600	INFO	ForkJoinPool-1-worker-1	stdout	FileOperation[operationType=InputFile.newStream, fileType=METADATA_TABLE_PROPERTIES]: 1
2025-12-01T06:07:46.442-0600	INFO	ForkJoinPool-1-worker-1	stdout	FileOperation[operationType=InputFile.newStream, fileType=TABLE_PROPERTIES]: 4

.addCopies(new FileOperationUtils.FileOperation("InputFile.lastModified", METADATA_TABLE), 4)
.addCopies(new FileOperationUtils.FileOperation("InputFile.length", METADATA_TABLE), 4)
.addCopies(new FileOperationUtils.FileOperation("InputFile.lastModified", METADATA_TABLE), 5)
.addCopies(new FileOperationUtils.FileOperation("InputFile.length", METADATA_TABLE), 5)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2025-12-01T06:11:05.252-0600	INFO	ForkJoinPool-1-worker-1	stdout	=== All File Paths Accessed ===
2025-12-01T06:11:05.252-0600	INFO	ForkJoinPool-1-worker-1	stdout	InputFile.length: local:/tests_5edea368-419e-415c-b30d-f5f1bedac2f0/hudi_multi_fg_pt_v8_mor/.hoodie/metadata/files/files-0000-0_12-104-302_20250429145946737.hfile
2025-12-01T06:11:05.253-0600	INFO	ForkJoinPool-1-worker-1	stdout	InputFile.lastModified: local:/tests_5edea368-419e-415c-b30d-f5f1bedac2f0/hudi_multi_fg_pt_v8_mor/.hoodie/metadata/files/files-0000-0_12-104-302_20250429145946737.hfile
2025-12-01T06:11:05.253-0600	INFO	ForkJoinPool-1-worker-1	stdout	InputFile.length: local:/tests_5edea368-419e-415c-b30d-f5f1bedac2f0/hudi_multi_fg_pt_v8_mor/.hoodie/metadata/files/files-0000-0_12-104-302_20250429145946737.hfile
2025-12-01T06:11:05.253-0600	INFO	ForkJoinPool-1-worker-1	stdout	InputFile.lastModified: local:/tests_5edea368-419e-415c-b30d-f5f1bedac2f0/hudi_multi_fg_pt_v8_mor/.hoodie/metadata/files/files-0000-0_12-104-302_20250429145946737.hfile
2025-12-01T06:11:05.253-0600	INFO	ForkJoinPool-1-worker-1	stdout	InputFile.length: local:/tests_5edea368-419e-415c-b30d-f5f1bedac2f0/hudi_multi_fg_pt_v8_mor/.hoodie/metadata/files/files-0000-0_12-104-302_20250429145946737.hfile
2025-12-01T06:11:05.253-0600	INFO	ForkJoinPool-1-worker-1	stdout	InputFile.lastModified: local:/tests_5edea368-419e-415c-b30d-f5f1bedac2f0/hudi_multi_fg_pt_v8_mor/.hoodie/metadata/files/files-0000-0_12-104-302_20250429145946737.hfile
2025-12-01T06:11:05.253-0600	INFO	ForkJoinPool-1-worker-1	stdout	InputFile.length: local:/tests_5edea368-419e-415c-b30d-f5f1bedac2f0/hudi_multi_fg_pt_v8_mor/.hoodie/metadata/files/files-0000-0_12-104-302_20250429145946737.hfile
2025-12-01T06:11:05.253-0600	INFO	ForkJoinPool-1-worker-1	stdout	InputFile.lastModified: local:/tests_5edea368-419e-415c-b30d-f5f1bedac2f0/hudi_multi_fg_pt_v8_mor/.hoodie/metadata/files/files-0000-0_12-104-302_20250429145946737.hfile
2025-12-01T06:11:05.253-0600	INFO	ForkJoinPool-1-worker-1	stdout	InputFile.length: local:/tests_5edea368-419e-415c-b30d-f5f1bedac2f0/hudi_multi_fg_pt_v8_mor/.hoodie/metadata/files/files-0000-0_12-104-302_20250429145946737.hfile
2025-12-01T06:11:05.253-0600	INFO	ForkJoinPool-1-worker-1	stdout	InputFile.lastModified: local:/tests_5edea368-419e-415c-b30d-f5f1bedac2f0/hudi_multi_fg_pt_v8_mor/.hoodie/metadata/files/files-0000-0_12-104-302_20250429145946737.hfile
2025-12-01T06:11:05.253-0600	INFO	ForkJoinPool-1-worker-1	stdout	
=== Actual Cache Accesses ===
2025-12-01T06:11:05.253-0600	INFO	ForkJoinPool-1-worker-1	stdout	FileOperation[operationType=Input.readTail, fileType=DATA]: 2
2025-12-01T06:11:05.253-0600	INFO	ForkJoinPool-1-worker-1	stdout	FileOperation[operationType=InputFile.lastModified, fileType=METADATA_TABLE]: 5
2025-12-01T06:11:05.253-0600	INFO	ForkJoinPool-1-worker-1	stdout	FileOperation[operationType=InputFile.length, fileType=METADATA_TABLE]: 5
2025-12-01T06:11:05.253-0600	INFO	ForkJoinPool-1-worker-1	stdout	FileOperation[operationType=InputFile.newStream, fileType=INDEX_DEFINITION]: 4
2025-12-01T06:11:05.253-0600	INFO	ForkJoinPool-1-worker-1	stdout	FileOperation[operationType=InputFile.newStream, fileType=METADATA_TABLE]: 6
2025-12-01T06:11:05.253-0600	INFO	ForkJoinPool-1-worker-1	stdout	FileOperation[operationType=InputFile.newStream, fileType=METADATA_TABLE_PROPERTIES]: 1
2025-12-01T06:11:05.253-0600	INFO	ForkJoinPool-1-worker-1	stdout	FileOperation[operationType=InputFile.newStream, fileType=TABLE_PROPERTIES]: 4
2025-12-01T06:11:05.253-0600	INFO	ForkJoinPool-1-worker-1	stdout	
=== Expected Cache Accesses ===
2025-12-01T06:11:05.254-0600	INFO	ForkJoinPool-1-worker-1	stdout	FileOperation[operationType=Input.readTail, fileType=DATA]: 2
2025-12-01T06:11:05.254-0600	INFO	ForkJoinPool-1-worker-1	stdout	FileOperation[operationType=InputFile.lastModified, fileType=METADATA_TABLE]: 4
2025-12-01T06:11:05.254-0600	INFO	ForkJoinPool-1-worker-1	stdout	FileOperation[operationType=InputFile.length, fileType=METADATA_TABLE]: 4
2025-12-01T06:11:05.254-0600	INFO	ForkJoinPool-1-worker-1	stdout	FileOperation[operationType=InputFile.newStream, fileType=INDEX_DEFINITION]: 4
2025-12-01T06:11:05.254-0600	INFO	ForkJoinPool-1-worker-1	stdout	FileOperation[operationType=InputFile.newStream, fileType=METADATA_TABLE]: 6
2025-12-01T06:11:05.254-0600	INFO	ForkJoinPool-1-worker-1	stdout	FileOperation[operationType=InputFile.newStream, fileType=METADATA_TABLE_PROPERTIES]: 1
2025-12-01T06:11:05.254-0600	INFO	ForkJoinPool-1-worker-1	stdout	FileOperation[operationType=InputFile.newStream, fileType=TABLE_PROPERTIES]: 4

@voonhous voonhous force-pushed the upgrade_hudi_version_1.1 branch 2 times, most recently from c5d75f0 to 3edb81c Compare December 1, 2025 12:45
@voonhous voonhous force-pushed the upgrade_hudi_version_1.1 branch 3 times, most recently from dc28426 to 34ae645 Compare December 1, 2025 13:40
@voonhous voonhous force-pushed the upgrade_hudi_version_1.1 branch from 34ae645 to f8ca84a Compare December 1, 2025 13:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants