Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] [Module Name] Spark changed the metadata source information of Hudi after writing data. Seatunnel reported an error when writing data and read Hudi's metadata information #8781

Open
1 of 3 tasks
lm520hy opened this issue Feb 21, 2025 · 1 comment
Labels

Comments

@lm520hy
Copy link

lm520hy commented Feb 21, 2025

Search before asking

  • I had searched in the issues and found no similar issues.

What happened

Spark changed the metadata source information of Hudi after writing data. Seatunnel reported an error when writing data and read Hudi's metadata information
spark写入数据后改变了hudi的元数据源信息 seatunnel同步数据报错读取hudi的元数据信息报错了

SeaTunnel Version

2.3.8

SeaTunnel Config

env {
  parallelism = 1
  job.mode = "BATCH"
}
source {
  FakeSource {
    parallelism = 1
    result_table_name = "fake2"
    row.num = 16
    schema = {
      fields {
	    id = "int"
        name = "string"
        price = "double"
		ts = "bigint"
      }
    }
    rows = [
      {
       kind = INSERT
       fields = [ 7,"l", 1100,117]
      }
    ]
  }

}
sink {
  Hudi {
    table_dfs_path = "hdfs:///hudi/"
    table_name = "hudi_mor_tbl2"
    table_type = "COPY_ON_WRITE"
    conf_files_path = "/soft/hadoop/etc/hadoop/hdfs-site.xml;/soft/hadoop/etc/hadoop/core-site.xml;/soft/hadoop/etc/hadoop/yarn-site.xml"
    batch_size = 10000
  }

Running Command

./bin/seatunnel.sh --config ./config/v2.batch.config.template6 -m local

Error Exception

2025-02-20 19:27:42,144 INFO  [a.h.c.t.t.HoodieActiveTimeline] [st-multi-table-sink-writer-2] - Loaded instants upto : Option{val=[20250220192742010__clean__COMPLETED__20250220192742120]}
2025-02-20 19:27:42,145 INFO  [o.a.h.c.t.HoodieTableConfig   ] [st-multi-table-sink-writer-2] - Loading table properties from hdfs:/hudi/default/hudi_mor_tbl2/.hoodie/hoodie.properties
2025-02-20 19:27:42,147 WARN  [o.a.s.e.s.TaskExecutionService] [BlockingWorker-TaskGroupLocation{jobId=944918221583024129, pipelineId=1, taskGroupId=50000}] - [localhost]:5801 [seatunnel-825957] [5.1] Exception in org.apache.seatunnel.engine.server.task.TransformSeaTunnelTask@1f445812
java.lang.RuntimeException: java.lang.RuntimeException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: Error limiting instant archival based on metadata table
	at org.apache.seatunnel.engine.server.task.flow.SinkFlowLifeCycle.received(SinkFlowLifeCycle.java:253) ~[seatunnel-starter.jar:2.3.8]
	at org.apache.seatunnel.engine.server.task.flow.SinkFlowLifeCycle.received(SinkFlowLifeCycle.java:66) ~[seatunnel-starter.jar:2.3.8]
	at org.apache.seatunnel.engine.server.task.SeaTunnelTransformCollector.collect(SeaTunnelTransformCollector.java:39) ~[seatunnel-starter.jar:2.3.8]
	at org.apache.seatunnel.engine.server.task.SeaTunnelTransformCollector.collect(SeaTunnelTransformCollector.java:27) ~[seatunnel-starter.jar:2.3.8]
	at org.apache.seatunnel.engine.server.task.group.queue.IntermediateBlockingQueue.handleRecord(IntermediateBlockingQueue.java:70) ~[seatunnel-starter.jar:2.3.8]
	at org.apache.seatunnel.engine.server.task.group.queue.IntermediateBlockingQueue.collect(IntermediateBlockingQueue.java:50) ~[seatunnel-starter.jar:2.3.8]
	at org.apache.seatunnel.engine.server.task.flow.IntermediateQueueFlowLifeCycle.collect(IntermediateQueueFlowLifeCycle.java:51) ~[seatunnel-starter.jar:2.3.8]
	at org.apache.seatunnel.engine.server.task.TransformSeaTunnelTask.collect(TransformSeaTunnelTask.java:73) ~[seatunnel-starter.jar:2.3.8]
	at org.apache.seatunnel.engine.server.task.SeaTunnelTask.stateProcess(SeaTunnelTask.java:168) ~[seatunnel-starter.jar:2.3.8]
	at org.apache.seatunnel.engine.server.task.TransformSeaTunnelTask.call(TransformSeaTunnelTask.java:78) ~[seatunnel-starter.jar:2.3.8]
	at org.apache.seatunnel.engine.server.TaskExecutionService$BlockingWorker.run(TaskExecutionService.java:693) [seatunnel-starter.jar:2.3.8]
	at org.apache.seatunnel.engine.server.TaskExecutionService$NamedTaskWrapper.run(TaskExecutionService.java:1018) [seatunnel-starter.jar:2.3.8]
	at org.apache.seatunnel.api.tracing.MDCRunnable.run(MDCRunnable.java:39) [seatunnel-starter.jar:2.3.8]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [?:1.8.0_381]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_381]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_381]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_381]
	at java.lang.Thread.run(Thread.java:750) [?:1.8.0_381]
Caused by: java.lang.RuntimeException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: Error limiting instant archival based on metadata table
	at org.apache.seatunnel.api.sink.multitablesink.MultiTableSinkWriter.prepareCommit(MultiTableSinkWriter.java:258) ~[seatunnel-starter.jar:2.3.8]
	at org.apache.seatunnel.engine.server.task.flow.SinkFlowLifeCycle.received(SinkFlowLifeCycle.java:188) ~[seatunnel-starter.jar:2.3.8]
	... 17 more
Caused by: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: Error limiting instant archival based on metadata table
	at java.util.concurrent.FutureTask.report(FutureTask.java:122) ~[?:1.8.0_381]
	at java.util.concurrent.FutureTask.get(FutureTask.java:192) ~[?:1.8.0_381]
	at org.apache.seatunnel.api.sink.multitablesink.MultiTableSinkWriter.prepareCommit(MultiTableSinkWriter.java:256) ~[seatunnel-starter.jar:2.3.8]
	at org.apache.seatunnel.engine.server.task.flow.SinkFlowLifeCycle.received(SinkFlowLifeCycle.java:188) ~[seatunnel-starter.jar:2.3.8]
	... 17 more
Caused by: org.apache.hudi.exception.HoodieException: Error limiting instant archival based on metadata table
	at org.apache.hudi.client.HoodieTimelineArchiver.getInstantsToArchive(HoodieTimelineArchiver.java:520) ~[connector-hudi-2.3.8.jar:2.3.8]
	at org.apache.hudi.client.HoodieTimelineArchiver.archiveIfRequired(HoodieTimelineArchiver.java:165) ~[connector-hudi-2.3.8.jar:2.3.8]
	at org.apache.hudi.client.BaseHoodieTableServiceClient.archive(BaseHoodieTableServiceClient.java:782) ~[connector-hudi-2.3.8.jar:2.3.8]
	at org.apache.hudi.client.BaseHoodieWriteClient.archive(BaseHoodieWriteClient.java:867) ~[connector-hudi-2.3.8.jar:2.3.8]
	at org.apache.hudi.client.BaseHoodieWriteClient.autoArchiveOnCommit(BaseHoodieWriteClient.java:596) ~[connector-hudi-2.3.8.jar:2.3.8]
	at org.apache.hudi.client.BaseHoodieWriteClient.mayBeCleanAndArchive(BaseHoodieWriteClient.java:562) ~[connector-hudi-2.3.8.jar:2.3.8]
	at org.apache.hudi.client.BaseHoodieWriteClient.postWrite(BaseHoodieWriteClient.java:528) ~[connector-hudi-2.3.8.jar:2.3.8]
	at org.apache.hudi.client.HoodieJavaWriteClient.insert(HoodieJavaWriteClient.java:141) ~[connector-hudi-2.3.8.jar:2.3.8]
	at org.apache.seatunnel.connectors.seatunnel.hudi.sink.writer.HudiRecordWriter.flush(HudiRecordWriter.java:160) ~[connector-hudi-2.3.8.jar:2.3.8]
	at org.apache.seatunnel.connectors.seatunnel.hudi.sink.writer.HudiRecordWriter.prepareCommit(HudiRecordWriter.java:186) ~[connector-hudi-2.3.8.jar:2.3.8]
	at org.apache.seatunnel.connectors.seatunnel.hudi.sink.writer.HudiSinkWriter.prepareCommit(HudiSinkWriter.java:101) ~[connector-hudi-2.3.8.jar:2.3.8]
	at org.apache.seatunnel.api.sink.multitablesink.MultiTableSinkWriter.lambda$prepareCommit$4(MultiTableSinkWriter.java:241) ~[seatunnel-starter.jar:2.3.8]
	... 6 more
Caused by: java.lang.UnsupportedOperationException
	at org.apache.hudi.metadata.FileSystemBackedTableMetadata.getLatestCompactionTime(FileSystemBackedTableMetadata.java:280) ~[connector-hudi-2.3.8.jar:2.3.8]
	at org.apache.hudi.client.HoodieTimelineArchiver.getInstantsToArchive(HoodieTimelineArchiver.java:510) ~[connector-hudi-2.3.8.jar:2.3.8]
	at org.apache.hudi.client.HoodieTimelineArchiver.archiveIfRequired(HoodieTimelineArchiver.java:165) ~[connector-hudi-2.3.8.jar:2.3.8]
	at org.apache.hudi.client.BaseHoodieTableServiceClient.archive(BaseHoodieTableServiceClient.java:782) ~[connector-hudi-2.3.8.jar:2.3.8]
	at org.apache.hudi.client.BaseHoodieWriteClient.archive(BaseHoodieWriteClient.java:867) ~[connector-hudi-2.3.8.jar:2.3.8]
	at org.apache.hudi.client.BaseHoodieWriteClient.autoArchiveOnCommit(BaseHoodieWriteClient.java:596) ~[connector-hudi-2.3.8.jar:2.3.8]
	at org.apache.hudi.client.BaseHoodieWriteClient.mayBeCleanAndArchive(BaseHoodieWriteClient.java:562) ~[connector-hudi-2.3.8.jar:2.3.8]
	at org.apache.hudi.client.BaseHoodieWriteClient.postWrite(BaseHoodieWriteClient.java:528) ~[connector-hudi-2.3.8.jar:2.3.8]
	at org.apache.hudi.client.HoodieJavaWriteClient.insert(HoodieJavaWriteClient.java:141) ~[connector-hudi-2.3.8.jar:2.3.8]
	at org.apache.seatunnel.connectors.seatunnel.hudi.sink.writer.HudiRecordWriter.flush(HudiRecordWriter.java:160) ~[connector-hudi-2.3.8.jar:2.3.8]
	at org.apache.seatunnel.connectors.seatunnel.hudi.sink.writer.HudiRecordWriter.prepareCommit(HudiRecordWriter.java:186) ~[connector-hudi-2.3.8.jar:2.3.8]
	at org.apache.seatunnel.connectors.seatunnel.hudi.sink.writer.HudiSinkWriter.prepareCommit(HudiSinkWriter.java:101) ~[connector-hudi-2.3.8.jar:2.3.8]
	at org.apache.seatunnel.api.sink.multitablesink.MultiTableSinkWriter.lambda$prepareCommit$4(MultiTableSinkWriter.java:241) ~[seatunnel-starter.jar:2.3.8]
	... 6 more
2025-02-20 19:27:42,152 INFO  [o.a.s.e.s.TaskExecutionService] [BlockingWorker-TaskGroupLocation{jobId=944918221583024129, pipelineId=1, taskGroupId=50000}] - [localhost]:5801 [seatunnel-825957] [5.1] taskDone, taskId = 70000, taskGroup = TaskGroupLocation{jobId=944918221583024129, pipelineId=1, taskGroupId=50000}
2025-02-20 19:27:42,152 INFO  [o.a.s.e.s.TaskExecutionService] [BlockingWorker-TaskGroupLocation{jobId=944918221583024129, pipelineId=1, taskGroupId=50000}] - [localhost]:5801 [seatunnel-825957] [5.1] task 70000 error with exception: [java.lang.RuntimeException: java.lang.RuntimeException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: Error limiting instant archival based on metadata table], cancel other task in taskGroup TaskGroupLocation{jobId=944918221583024129, pipelineId=1, taskGroupId=50000}.
2025-02-20 19:27:42,152 WARN  [o.a.s.e.s.TaskExecutionService] [BlockingWorker-TaskGroupLocation{jobId=944918221583024129, pipelineId=1, taskGroupId=50000}] - [localhost]:5801 [seatunnel-825957] [5.1] Interrupted task 60000 - org.apache.seatunnel.engine.server.task.SourceSeaTunnelTask@4fe9a7d5
2025-02-20 19:27:42,152 INFO  [o.a.s.e.s.TaskExecutionService] [BlockingWorker-TaskGroupLocation{jobId=944918221583024129, pipelineId=1, taskGroupId=50000}] - [localhost]:5801 [seatunnel-825957] [5.1] taskDone, taskId = 60000, taskGroup = TaskGroupLocation{jobId=944918221583024129, pipelineId=1, taskGroupId=50000}
2025-02-20 19:27:42,154 INFO  [.a.h.c.t.HoodieTableMetaClient] [ForkJoinPool.commonPool-worker-1] - Loading HoodieTableMetaClient from hdfs:///hudi//default/hudi_mor_tbl2
2025-02-20 19:27:42,155 INFO  [o.a.h.c.t.HoodieTableConfig   ] [ForkJoinPool.commonPool-worker-1] - Loading table properties from hdfs:/hudi/default/hudi_mor_tbl2/.hoodie/hoodie.properties
2025-02-20 19:27:42,158 INFO  [.a.h.c.t.HoodieTableMetaClient] [ForkJoinPool.commonPool-worker-1] - Finished Loading Table of type COPY_ON_WRITE(version=1, baseFileFormat=PARQUET) from hdfs:///hudi//default/hudi_mor_tbl2
2025-02-20 19:27:42,158 INFO  [.a.h.c.t.HoodieTableMetaClient] [ForkJoinPool.commonPool-worker-1] - Loading Active commit timeline for hdfs:///hudi//default/hudi_mor_tbl2
2025-02-20 19:27:42,159 INFO  [o.a.s.e.s.TaskExecutionService] [BlockingWorker-TaskGroupLocation{jobId=944918221583024129, pipelineId=1, taskGroupId=50000}] - [localhost]:5801 [seatunnel-825957] [5.1] taskGroup TaskGroupLocation{jobId=944918221583024129, pipelineId=1, taskGroupId=50000} complete with FAILED
2025-02-20 19:27:42,160 INFO  [o.a.s.e.s.TaskExecutionService] [BlockingWorker-TaskGroupLocation{jobId=944918221583024129, pipelineId=1, taskGroupId=50000}] - [localhost]:5801 [seatunnel-825957] [5.1] task 60000 error with exception: [java.lang.RuntimeException: java.lang.RuntimeException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: Error limiting instant archival based on metadata table], cancel other task in taskGroup TaskGroupLocation{jobId=944918221583024129, pipelineId=1, taskGroupId=50000}.
2025-02-20 19:27:42,160 INFO  [o.a.s.e.s.TaskExecutionService] [hz.main.seaTunnel.task.thread-6] - [localhost]:5801 [seatunnel-825957] [5.1] Task TaskGroupLocation{jobId=944918221583024129, pipelineId=1, taskGroupId=50000} complete with state FAILED
2025-02-20 19:27:42,160 INFO  [a.h.c.t.t.HoodieActiveTimeline] [ForkJoinPool.commonPool-worker-1] - Loaded instants upto : Option{val=[20250220192742010__clean__COMPLETED__20250220192742120]}
2025-02-20 19:27:42,160 INFO  [o.a.h.c.u.CleanerUtils        ] [ForkJoinPool.commonPool-worker-1] - Cleaned failed attempts if any
2025-02-20 19:27:42,160 INFO  [o.a.s.e.s.CoordinatorService  ] [hz.main.seaTunnel.task.thread-6] - [localhost]:5801 [seatunnel-825957] [5.1] Received task end from execution TaskGroupLocation{jobId=944918221583024129, pipelineId=1, taskGroupId=50000}, state FAILED
2025-02-20 19:27:42,161 INFO  [.a.h.c.t.HoodieTableMetaClient] [ForkJoinPool.commonPool-worker-1] - Loading HoodieTableMetaClient from hdfs:///hudi//default/hudi_mor_tbl2
2025-02-20 19:27:42,161 INFO  [o.a.s.a.e.LoggingEventHandler ] [hz.main.generic-operation.thread-36] - log event: ReaderCloseEvent(createdTime=1740050862160, jobId=944918221583024129, eventType=LIFECYCLE_READER_CLOSE)
2025-02-20 19:27:42,162 INFO  [o.a.h.c.t.HoodieTableConfig   ] [ForkJoinPool.commonPool-worker-1] - Loading table properties from hdfs:/hudi/default/hudi_mor_tbl2/.hoodie/hoodie.properties
2025-02-20 19:27:42,162 INFO  [o.a.s.e.s.d.p.PhysicalVertex  ] [hz.main.seaTunnel.task.thread-6] - Job SeaTunnel_Job (944918221583024129), Pipeline: [(1/1)], task: [pipeline-1 [Source[0]-FakeSource]-SourceTask (1/1)] turned from state RUNNING to FAILED.
2025-02-20 19:27:42,162 INFO  [o.a.s.e.s.d.p.PhysicalVertex  ] [hz.main.seaTunnel.task.thread-6] - Job SeaTunnel_Job (944918221583024129), Pipeline: [(1/1)], task: [pipeline-1 [Source[0]-FakeSource]-SourceTask (1/1)] state process is stopped
2025-02-20 19:27:42,162 ERROR [o.a.s.e.s.d.p.PhysicalVertex  ] [hz.main.seaTunnel.task.thread-6] - Job SeaTunnel_Job (944918221583024129), Pipeline: [(1/1)], task: [pipeline-1 [Source[0]-FakeSource]-SourceTask (1/1)] end with state FAILED and Exception: java.lang.RuntimeException: java.lang.RuntimeException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: Error limiting instant archival based on metadata table
	at org.apache.seatunnel.engine.server.task.flow.SinkFlowLifeCycle.received(SinkFlowLifeCycle.java:253)
	at org.apache.seatunnel.engine.server.task.flow.SinkFlowLifeCycle.received(SinkFlowLifeCycle.java:66)
	at org.apache.seatunnel.engine.server.task.SeaTunnelTransformCollector.collect(SeaTunnelTransformCollector.java:39)
	at org.apache.seatunnel.engine.server.task.SeaTunnelTransformCollector.collect(SeaTunnelTransformCollector.java:27)
	at org.apache.seatunnel.engine.server.task.group.queue.IntermediateBlockingQueue.handleRecord(IntermediateBlockingQueue.java:70)
	at org.apache.seatunnel.engine.server.task.group.queue.IntermediateBlockingQueue.collect(IntermediateBlockingQueue.java:50)
	at org.apache.seatunnel.engine.server.task.flow.IntermediateQueueFlowLifeCycle.collect(IntermediateQueueFlowLifeCycle.java:51)
	at org.apache.seatunnel.engine.server.task.TransformSeaTunnelTask.collect(TransformSeaTunnelTask.java:73)
	at org.apache.seatunnel.engine.server.task.SeaTunnelTask.stateProcess(SeaTunnelTask.java:168)
	at org.apache.seatunnel.engine.server.task.TransformSeaTunnelTask.call(TransformSeaTunnelTask.java:78)
	at org.apache.seatunnel.engine.server.TaskExecutionService$BlockingWorker.run(TaskExecutionService.java:693)
	at org.apache.seatunnel.engine.server.TaskExecutionService$NamedTaskWrapper.run(TaskExecutionService.java:1018)
	at org.apache.seatunnel.api.tracing.MDCRunnable.run(MDCRunnable.java:39)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.RuntimeException: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: Error limiting instant archival based on metadata table
	at org.apache.seatunnel.api.sink.multitablesink.MultiTableSinkWriter.prepareCommit(MultiTableSinkWriter.java:258)
	at org.apache.seatunnel.engine.server.task.flow.SinkFlowLifeCycle.received(SinkFlowLifeCycle.java:188)
	... 17 more
Caused by: java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieException: Error limiting instant archival based on metadata table
	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
	at org.apache.seatunnel.api.sink.multitablesink.MultiTableSinkWriter.prepareCommit(MultiTableSinkWriter.java:256)
	... 18 more
Caused by: org.apache.hudi.exception.HoodieException: Error limiting instant archival based on metadata table

Zeta or Flink or Spark Version

No response

Java or Scala Version

No response

Screenshots

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@lm520hy lm520hy added the bug label Feb 21, 2025
@lm520hy
Copy link
Author

lm520hy commented Feb 21, 2025

When hudi's metadata is set to hoodie. table. metadata. partitions=files, there is a data synchronization error in seatunnel
当hudi的元数据设置hoodie.table.metadata.partitions=files时 seatunnel同步数据报错

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant