You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Bug] [Module Name] Spark changed the metadata source information of Hudi after writing data. Seatunnel reported an error when writing data and read Hudi's metadata information
#8781
Open
1 of 3 tasks
lm520hy opened this issue
Feb 21, 2025
· 1 comment
I had searched in the issues and found no similar issues.
What happened
Spark changed the metadata source information of Hudi after writing data. Seatunnel reported an error when writing data and read Hudi's metadata information
spark写入数据后改变了hudi的元数据源信息 seatunnel同步数据报错读取hudi的元数据信息报错了
./bin/seatunnel.sh --config ./config/v2.batch.config.template6 -m local
Error Exception
2025-02-20 19:27:42,144 INFO [a.h.c.t.t.HoodieActiveTimeline] [st-multi-table-sink-writer-2] - Loaded instants upto : Option{val=[20250220192742010__clean__COMPLETED__20250220192742120]}
2025-02-20 19:27:42,145 INFO [o.a.h.c.t.HoodieTableConfig ] [st-multi-table-sink-writer-2] - Loading table properties from hdfs:/hudi/default/hudi_mor_tbl2/.hoodie/hoodie.properties
2025-02-20 19:27:42,147 WARN [o.a.s.e.s.TaskExecutionService] [BlockingWorker-TaskGroupLocation{jobId=944918221583024129, pipelineId=1, taskGroupId=50000}] - [localhost]:5801 [seatunnel-825957] [5.1] Exception in org.apache.seatunnel.engine.server.task.TransformSeaTunnelTask@1f445812
java.lang.RuntimeException: java.lang.RuntimeException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: Error limiting instant archival based on metadata table
at org.apache.seatunnel.engine.server.task.flow.SinkFlowLifeCycle.received(SinkFlowLifeCycle.java:253) ~[seatunnel-starter.jar:2.3.8]
at org.apache.seatunnel.engine.server.task.flow.SinkFlowLifeCycle.received(SinkFlowLifeCycle.java:66) ~[seatunnel-starter.jar:2.3.8]
at org.apache.seatunnel.engine.server.task.SeaTunnelTransformCollector.collect(SeaTunnelTransformCollector.java:39) ~[seatunnel-starter.jar:2.3.8]
at org.apache.seatunnel.engine.server.task.SeaTunnelTransformCollector.collect(SeaTunnelTransformCollector.java:27) ~[seatunnel-starter.jar:2.3.8]
at org.apache.seatunnel.engine.server.task.group.queue.IntermediateBlockingQueue.handleRecord(IntermediateBlockingQueue.java:70) ~[seatunnel-starter.jar:2.3.8]
at org.apache.seatunnel.engine.server.task.group.queue.IntermediateBlockingQueue.collect(IntermediateBlockingQueue.java:50) ~[seatunnel-starter.jar:2.3.8]
at org.apache.seatunnel.engine.server.task.flow.IntermediateQueueFlowLifeCycle.collect(IntermediateQueueFlowLifeCycle.java:51) ~[seatunnel-starter.jar:2.3.8]
at org.apache.seatunnel.engine.server.task.TransformSeaTunnelTask.collect(TransformSeaTunnelTask.java:73) ~[seatunnel-starter.jar:2.3.8]
at org.apache.seatunnel.engine.server.task.SeaTunnelTask.stateProcess(SeaTunnelTask.java:168) ~[seatunnel-starter.jar:2.3.8]
at org.apache.seatunnel.engine.server.task.TransformSeaTunnelTask.call(TransformSeaTunnelTask.java:78) ~[seatunnel-starter.jar:2.3.8]
at org.apache.seatunnel.engine.server.TaskExecutionService$BlockingWorker.run(TaskExecutionService.java:693) [seatunnel-starter.jar:2.3.8]
at org.apache.seatunnel.engine.server.TaskExecutionService$NamedTaskWrapper.run(TaskExecutionService.java:1018) [seatunnel-starter.jar:2.3.8]
at org.apache.seatunnel.api.tracing.MDCRunnable.run(MDCRunnable.java:39) [seatunnel-starter.jar:2.3.8]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_381]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_381]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_381]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_381]
at java.lang.Thread.run(Thread.java:750) [?:1.8.0_381]
Caused by: java.lang.RuntimeException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: Error limiting instant archival based on metadata table
at org.apache.seatunnel.api.sink.multitablesink.MultiTableSinkWriter.prepareCommit(MultiTableSinkWriter.java:258) ~[seatunnel-starter.jar:2.3.8]
at org.apache.seatunnel.engine.server.task.flow.SinkFlowLifeCycle.received(SinkFlowLifeCycle.java:188) ~[seatunnel-starter.jar:2.3.8]
... 17 more
Caused by: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: Error limiting instant archival based on metadata table
at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[?:1.8.0_381]
at java.util.concurrent.FutureTask.get(FutureTask.java:192) ~[?:1.8.0_381]
at org.apache.seatunnel.api.sink.multitablesink.MultiTableSinkWriter.prepareCommit(MultiTableSinkWriter.java:256) ~[seatunnel-starter.jar:2.3.8]
at org.apache.seatunnel.engine.server.task.flow.SinkFlowLifeCycle.received(SinkFlowLifeCycle.java:188) ~[seatunnel-starter.jar:2.3.8]
... 17 more
Caused by: org.apache.hudi.exception.HoodieException: Error limiting instant archival based on metadata table
at org.apache.hudi.client.HoodieTimelineArchiver.getInstantsToArchive(HoodieTimelineArchiver.java:520) ~[connector-hudi-2.3.8.jar:2.3.8]
at org.apache.hudi.client.HoodieTimelineArchiver.archiveIfRequired(HoodieTimelineArchiver.java:165) ~[connector-hudi-2.3.8.jar:2.3.8]
at org.apache.hudi.client.BaseHoodieTableServiceClient.archive(BaseHoodieTableServiceClient.java:782) ~[connector-hudi-2.3.8.jar:2.3.8]
at org.apache.hudi.client.BaseHoodieWriteClient.archive(BaseHoodieWriteClient.java:867) ~[connector-hudi-2.3.8.jar:2.3.8]
at org.apache.hudi.client.BaseHoodieWriteClient.autoArchiveOnCommit(BaseHoodieWriteClient.java:596) ~[connector-hudi-2.3.8.jar:2.3.8]
at org.apache.hudi.client.BaseHoodieWriteClient.mayBeCleanAndArchive(BaseHoodieWriteClient.java:562) ~[connector-hudi-2.3.8.jar:2.3.8]
at org.apache.hudi.client.BaseHoodieWriteClient.postWrite(BaseHoodieWriteClient.java:528) ~[connector-hudi-2.3.8.jar:2.3.8]
at org.apache.hudi.client.HoodieJavaWriteClient.insert(HoodieJavaWriteClient.java:141) ~[connector-hudi-2.3.8.jar:2.3.8]
at org.apache.seatunnel.connectors.seatunnel.hudi.sink.writer.HudiRecordWriter.flush(HudiRecordWriter.java:160) ~[connector-hudi-2.3.8.jar:2.3.8]
at org.apache.seatunnel.connectors.seatunnel.hudi.sink.writer.HudiRecordWriter.prepareCommit(HudiRecordWriter.java:186) ~[connector-hudi-2.3.8.jar:2.3.8]
at org.apache.seatunnel.connectors.seatunnel.hudi.sink.writer.HudiSinkWriter.prepareCommit(HudiSinkWriter.java:101) ~[connector-hudi-2.3.8.jar:2.3.8]
at org.apache.seatunnel.api.sink.multitablesink.MultiTableSinkWriter.lambda$prepareCommit$4(MultiTableSinkWriter.java:241) ~[seatunnel-starter.jar:2.3.8]
... 6 more
Caused by: java.lang.UnsupportedOperationException
at org.apache.hudi.metadata.FileSystemBackedTableMetadata.getLatestCompactionTime(FileSystemBackedTableMetadata.java:280) ~[connector-hudi-2.3.8.jar:2.3.8]
at org.apache.hudi.client.HoodieTimelineArchiver.getInstantsToArchive(HoodieTimelineArchiver.java:510) ~[connector-hudi-2.3.8.jar:2.3.8]
at org.apache.hudi.client.HoodieTimelineArchiver.archiveIfRequired(HoodieTimelineArchiver.java:165) ~[connector-hudi-2.3.8.jar:2.3.8]
at org.apache.hudi.client.BaseHoodieTableServiceClient.archive(BaseHoodieTableServiceClient.java:782) ~[connector-hudi-2.3.8.jar:2.3.8]
at org.apache.hudi.client.BaseHoodieWriteClient.archive(BaseHoodieWriteClient.java:867) ~[connector-hudi-2.3.8.jar:2.3.8]
at org.apache.hudi.client.BaseHoodieWriteClient.autoArchiveOnCommit(BaseHoodieWriteClient.java:596) ~[connector-hudi-2.3.8.jar:2.3.8]
at org.apache.hudi.client.BaseHoodieWriteClient.mayBeCleanAndArchive(BaseHoodieWriteClient.java:562) ~[connector-hudi-2.3.8.jar:2.3.8]
at org.apache.hudi.client.BaseHoodieWriteClient.postWrite(BaseHoodieWriteClient.java:528) ~[connector-hudi-2.3.8.jar:2.3.8]
at org.apache.hudi.client.HoodieJavaWriteClient.insert(HoodieJavaWriteClient.java:141) ~[connector-hudi-2.3.8.jar:2.3.8]
at org.apache.seatunnel.connectors.seatunnel.hudi.sink.writer.HudiRecordWriter.flush(HudiRecordWriter.java:160) ~[connector-hudi-2.3.8.jar:2.3.8]
at org.apache.seatunnel.connectors.seatunnel.hudi.sink.writer.HudiRecordWriter.prepareCommit(HudiRecordWriter.java:186) ~[connector-hudi-2.3.8.jar:2.3.8]
at org.apache.seatunnel.connectors.seatunnel.hudi.sink.writer.HudiSinkWriter.prepareCommit(HudiSinkWriter.java:101) ~[connector-hudi-2.3.8.jar:2.3.8]
at org.apache.seatunnel.api.sink.multitablesink.MultiTableSinkWriter.lambda$prepareCommit$4(MultiTableSinkWriter.java:241) ~[seatunnel-starter.jar:2.3.8]
... 6 more
2025-02-20 19:27:42,152 INFO [o.a.s.e.s.TaskExecutionService] [BlockingWorker-TaskGroupLocation{jobId=944918221583024129, pipelineId=1, taskGroupId=50000}] - [localhost]:5801 [seatunnel-825957] [5.1] taskDone, taskId = 70000, taskGroup = TaskGroupLocation{jobId=944918221583024129, pipelineId=1, taskGroupId=50000}
2025-02-20 19:27:42,152 INFO [o.a.s.e.s.TaskExecutionService] [BlockingWorker-TaskGroupLocation{jobId=944918221583024129, pipelineId=1, taskGroupId=50000}] - [localhost]:5801 [seatunnel-825957] [5.1] task 70000 error with exception: [java.lang.RuntimeException: java.lang.RuntimeException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: Error limiting instant archival based on metadata table], cancel other task in taskGroup TaskGroupLocation{jobId=944918221583024129, pipelineId=1, taskGroupId=50000}.
2025-02-20 19:27:42,152 WARN [o.a.s.e.s.TaskExecutionService] [BlockingWorker-TaskGroupLocation{jobId=944918221583024129, pipelineId=1, taskGroupId=50000}] - [localhost]:5801 [seatunnel-825957] [5.1] Interrupted task 60000 - org.apache.seatunnel.engine.server.task.SourceSeaTunnelTask@4fe9a7d5
2025-02-20 19:27:42,152 INFO [o.a.s.e.s.TaskExecutionService] [BlockingWorker-TaskGroupLocation{jobId=944918221583024129, pipelineId=1, taskGroupId=50000}] - [localhost]:5801 [seatunnel-825957] [5.1] taskDone, taskId = 60000, taskGroup = TaskGroupLocation{jobId=944918221583024129, pipelineId=1, taskGroupId=50000}
2025-02-20 19:27:42,154 INFO [.a.h.c.t.HoodieTableMetaClient] [ForkJoinPool.commonPool-worker-1] - Loading HoodieTableMetaClient from hdfs:///hudi//default/hudi_mor_tbl2
2025-02-20 19:27:42,155 INFO [o.a.h.c.t.HoodieTableConfig ] [ForkJoinPool.commonPool-worker-1] - Loading table properties from hdfs:/hudi/default/hudi_mor_tbl2/.hoodie/hoodie.properties
2025-02-20 19:27:42,158 INFO [.a.h.c.t.HoodieTableMetaClient] [ForkJoinPool.commonPool-worker-1] - Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from hdfs:///hudi//default/hudi_mor_tbl2
2025-02-20 19:27:42,158 INFO [.a.h.c.t.HoodieTableMetaClient] [ForkJoinPool.commonPool-worker-1] - Loading Active commit timeline for hdfs:///hudi//default/hudi_mor_tbl2
2025-02-20 19:27:42,159 INFO [o.a.s.e.s.TaskExecutionService] [BlockingWorker-TaskGroupLocation{jobId=944918221583024129, pipelineId=1, taskGroupId=50000}] - [localhost]:5801 [seatunnel-825957] [5.1] taskGroup TaskGroupLocation{jobId=944918221583024129, pipelineId=1, taskGroupId=50000} complete with FAILED
2025-02-20 19:27:42,160 INFO [o.a.s.e.s.TaskExecutionService] [BlockingWorker-TaskGroupLocation{jobId=944918221583024129, pipelineId=1, taskGroupId=50000}] - [localhost]:5801 [seatunnel-825957] [5.1] task 60000 error with exception: [java.lang.RuntimeException: java.lang.RuntimeException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: Error limiting instant archival based on metadata table], cancel other task in taskGroup TaskGroupLocation{jobId=944918221583024129, pipelineId=1, taskGroupId=50000}.
2025-02-20 19:27:42,160 INFO [o.a.s.e.s.TaskExecutionService] [hz.main.seaTunnel.task.thread-6] - [localhost]:5801 [seatunnel-825957] [5.1] Task TaskGroupLocation{jobId=944918221583024129, pipelineId=1, taskGroupId=50000} complete with state FAILED
2025-02-20 19:27:42,160 INFO [a.h.c.t.t.HoodieActiveTimeline] [ForkJoinPool.commonPool-worker-1] - Loaded instants upto : Option{val=[20250220192742010__clean__COMPLETED__20250220192742120]}
2025-02-20 19:27:42,160 INFO [o.a.h.c.u.CleanerUtils ] [ForkJoinPool.commonPool-worker-1] - Cleaned failed attempts if any
2025-02-20 19:27:42,160 INFO [o.a.s.e.s.CoordinatorService ] [hz.main.seaTunnel.task.thread-6] - [localhost]:5801 [seatunnel-825957] [5.1] Received task end from execution TaskGroupLocation{jobId=944918221583024129, pipelineId=1, taskGroupId=50000}, state FAILED
2025-02-20 19:27:42,161 INFO [.a.h.c.t.HoodieTableMetaClient] [ForkJoinPool.commonPool-worker-1] - Loading HoodieTableMetaClient from hdfs:///hudi//default/hudi_mor_tbl2
2025-02-20 19:27:42,161 INFO [o.a.s.a.e.LoggingEventHandler ] [hz.main.generic-operation.thread-36] - log event: ReaderCloseEvent(createdTime=1740050862160, jobId=944918221583024129, eventType=LIFECYCLE_READER_CLOSE)
2025-02-20 19:27:42,162 INFO [o.a.h.c.t.HoodieTableConfig ] [ForkJoinPool.commonPool-worker-1] - Loading table properties from hdfs:/hudi/default/hudi_mor_tbl2/.hoodie/hoodie.properties
2025-02-20 19:27:42,162 INFO [o.a.s.e.s.d.p.PhysicalVertex ] [hz.main.seaTunnel.task.thread-6] - Job SeaTunnel_Job (944918221583024129), Pipeline: [(1/1)], task: [pipeline-1 [Source[0]-FakeSource]-SourceTask (1/1)] turned from state RUNNING to FAILED.
2025-02-20 19:27:42,162 INFO [o.a.s.e.s.d.p.PhysicalVertex ] [hz.main.seaTunnel.task.thread-6] - Job SeaTunnel_Job (944918221583024129), Pipeline: [(1/1)], task: [pipeline-1 [Source[0]-FakeSource]-SourceTask (1/1)] state process is stopped
2025-02-20 19:27:42,162 ERROR [o.a.s.e.s.d.p.PhysicalVertex ] [hz.main.seaTunnel.task.thread-6] - Job SeaTunnel_Job (944918221583024129), Pipeline: [(1/1)], task: [pipeline-1 [Source[0]-FakeSource]-SourceTask (1/1)] end with state FAILED and Exception: java.lang.RuntimeException: java.lang.RuntimeException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: Error limiting instant archival based on metadata table
at org.apache.seatunnel.engine.server.task.flow.SinkFlowLifeCycle.received(SinkFlowLifeCycle.java:253)
at org.apache.seatunnel.engine.server.task.flow.SinkFlowLifeCycle.received(SinkFlowLifeCycle.java:66)
at org.apache.seatunnel.engine.server.task.SeaTunnelTransformCollector.collect(SeaTunnelTransformCollector.java:39)
at org.apache.seatunnel.engine.server.task.SeaTunnelTransformCollector.collect(SeaTunnelTransformCollector.java:27)
at org.apache.seatunnel.engine.server.task.group.queue.IntermediateBlockingQueue.handleRecord(IntermediateBlockingQueue.java:70)
at org.apache.seatunnel.engine.server.task.group.queue.IntermediateBlockingQueue.collect(IntermediateBlockingQueue.java:50)
at org.apache.seatunnel.engine.server.task.flow.IntermediateQueueFlowLifeCycle.collect(IntermediateQueueFlowLifeCycle.java:51)
at org.apache.seatunnel.engine.server.task.TransformSeaTunnelTask.collect(TransformSeaTunnelTask.java:73)
at org.apache.seatunnel.engine.server.task.SeaTunnelTask.stateProcess(SeaTunnelTask.java:168)
at org.apache.seatunnel.engine.server.task.TransformSeaTunnelTask.call(TransformSeaTunnelTask.java:78)
at org.apache.seatunnel.engine.server.TaskExecutionService$BlockingWorker.run(TaskExecutionService.java:693)
at org.apache.seatunnel.engine.server.TaskExecutionService$NamedTaskWrapper.run(TaskExecutionService.java:1018)
at org.apache.seatunnel.api.tracing.MDCRunnable.run(MDCRunnable.java:39)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.RuntimeException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: Error limiting instant archival based on metadata table
at org.apache.seatunnel.api.sink.multitablesink.MultiTableSinkWriter.prepareCommit(MultiTableSinkWriter.java:258)
at org.apache.seatunnel.engine.server.task.flow.SinkFlowLifeCycle.received(SinkFlowLifeCycle.java:188)
... 17 more
Caused by: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: Error limiting instant archival based on metadata table
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at org.apache.seatunnel.api.sink.multitablesink.MultiTableSinkWriter.prepareCommit(MultiTableSinkWriter.java:256)
... 18 more
Caused by: org.apache.hudi.exception.HoodieException: Error limiting instant archival based on metadata table
When hudi's metadata is set to hoodie. table. metadata. partitions=files, there is a data synchronization error in seatunnel
当hudi的元数据设置hoodie.table.metadata.partitions=files时 seatunnel同步数据报错
Search before asking
What happened
Spark changed the metadata source information of Hudi after writing data. Seatunnel reported an error when writing data and read Hudi's metadata information
spark写入数据后改变了hudi的元数据源信息 seatunnel同步数据报错读取hudi的元数据信息报错了
SeaTunnel Version
2.3.8
SeaTunnel Config
Running Command
./bin/seatunnel.sh --config ./config/v2.batch.config.template6 -m local
Error Exception
Zeta or Flink or Spark Version
No response
Java or Scala Version
No response
Screenshots
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: