Skip to content

Add TrinoParquetFileReader to handle archived timelines parquet files#72

Open
voonhous wants to merge 1 commit intoonehouseinc:masterfrom
voonhous:fix-archived-timeline-reading
Open

Add TrinoParquetFileReader to handle archived timelines parquet files#72
voonhous wants to merge 1 commit intoonehouseinc:masterfrom
voonhous:fix-archived-timeline-reading

Conversation

@voonhous
Copy link

Description

This PR fixes the bug where a Hudi table with version 8 and an archived LSM timeline will become unreadable on trino.

We fix this by implementing TrinoParquetFileReader by extending HoodieAvroFileReader.

The main mechanism reuses HudiAvroSerializer to convert a Page, which is created using Trino's ParquetReader to create a CloseableIterator<IndexedRecord> to increase code reuse as much as possible.

Additional context and related issues

error thrown:

Caused by: java.lang.UnsupportedOperationException: HudiTrinoFileReaderFactory does not support Parquet file reader
	at io.trino.plugin.hudi.io.HudiTrinoFileReaderFactory.newParquetFileReader(HudiTrinoFileReaderFactory.java:39)
	at org.apache.hudi.io.storage.HoodieFileReaderFactory.getFileReader(HoodieFileReaderFactory.java:70)
	at org.apache.hudi.io.storage.HoodieFileReaderFactory.getFileReader(HoodieFileReaderFactory.java:50)
	at org.apache.hudi.common.table.timeline.versioning.v2.ArchivedTimelineLoaderV2.lambda$loadInstants$1(ArchivedTimelineLoaderV2.java:66)
	at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:186)
	at java.base/java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:197)
	at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1716)
	at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:570)
	at java.base/java.util.stream.ForEachOps$ForEachTask.compute(ForEachOps.java:293)
	at java.base/java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:759)
	at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:507)
	at java.base/java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:676)
	at java.base/java.util.stream.ForEachOps$ForEachOp.evaluateParallel(ForEachOps.java:162)
	at java.base/java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateParallel(ForEachOps.java:176)
	at java.base/java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:264)
	at java.base/java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:636)
	at org.apache.hudi.common.table.timeline.versioning.v2.ArchivedTimelineLoaderV2.loadInstants(ArchivedTimelineLoaderV2.java:62)
	at org.apache.hudi.common.table.timeline.versioning.v2.CompletionTimeQueryViewV2.load(CompletionTimeQueryViewV2.java:313)
	at org.apache.hudi.common.table.timeline.versioning.v2.CompletionTimeQueryViewV2.<init>(CompletionTimeQueryViewV2.java:108)
	at org.apache.hudi.common.table.timeline.versioning.v2.CompletionTimeQueryViewV2.<init>(CompletionTimeQueryViewV2.java:93)
	at org.apache.hudi.common.table.timeline.versioning.v2.TimelineV2Factory.createCompletionTimeQueryView(TimelineV2Factory.java:79)
	at org.apache.hudi.common.table.view.AbstractTableFileSystemView.init(AbstractTableFileSystemView.java:127)
	at org.apache.hudi.common.table.view.HoodieTableFileSystemView.init(HoodieTableFileSystemView.java:128)
	at org.apache.hudi.common.table.view.HoodieTableFileSystemView.<init>(HoodieTableFileSystemView.java:122)
	at org.apache.hudi.common.table.view.HoodieTableFileSystemView.<init>(HoodieTableFileSystemView.java:102)
	at io.trino.plugin.hudi.HudiUtil.getFileSystemView(HudiUtil.java:384)
	at io.trino.plugin.hudi.query.HudiSnapshotDirectoryLister.lambda$new$0(HudiSnapshotDirectoryLister.java:57)
	at org.apache.hudi.util.Lazy.get(Lazy.java:54)
	at io.trino.plugin.hudi.query.HudiSnapshotDirectoryLister.listStatus(HudiSnapshotDirectoryLister.java:74)
	at io.trino.plugin.hudi.partition.HudiPartitionInfoLoader.generateSplitsFromPartition(HudiPartitionInfoLoader.java:71)
	at io.trino.plugin.hudi.partition.HudiPartitionInfoLoader.run(HudiPartitionInfoLoader.java:63)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:545)
	at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
	at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:75)
	... 5 more

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

## Section
* Fix some things. ({issue}`issuenumber`)

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is being reviewed by Cursor Bugbot

Details

Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

@voonhous voonhous force-pushed the fix-archived-timeline-reading branch from 63021aa to a8976ab Compare September 25, 2025 03:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant