Data pipeline failed without retry #154

zuston · 2024-11-06T03:39:09Z

When using the hdfs-native crate, I encountered the data pipeline failed. If this happened by the problem datanode, do we need to request a new block location?

Could you help look this problem? @Kimahriman

zuston · 2024-11-06T08:27:09Z

After digging the vannila hdfs code, it looks it will replace bad datanode if pipeline failed.

Kimahriman · 2024-11-06T13:55:33Z

Yeah the write path is the least resilient part of the library right now. I need to take a deeper look to understand exactly how the Java writer handles lost data nodes, and then figure out how I could write a test for something like that.

If you're not already, I'd currently recommend retrying the whole write from scratch if you still have access to all the data you are trying to write.

Kimahriman · 2024-11-06T18:30:39Z

Based on a quick look, it seems like the logic is generally:

If a DataNode fails, either the one it is talking to or any of the others being replicated to, simply remove that DataNode from the block being written and continue on
If certain conditions are true, add a new DataNode to the block. Specifically the default conditions are in this comment:

DEFAULT condition:
   *   Let r be the replication number.
   *   Let n be the number of existing datanodes.
   *   Add a new datanode only if r >= 3 and either
   *   (1) floor(r/2) >= n or (2) the block is hflushed/appended.

So for example with replication of 3, a new DataNode would only get added if two DataNodes fail during writing that block.

The first one is simpler and I could look into trying to add that, though still not exactly sure how I would test it. That should make things more resilient than the current behavior which is simply failing on the first DataNode error.

The second is a bit more complex and requires copying the already written data to a new replica before continuing, and probably wouldn't be something I could tackle anytime soon.

zuston · 2024-11-06T23:56:54Z

Thanks for your quick reply, I found another error status when appending file. Please see this:

And I think the first one solution is good for my append case. @Kimahriman

zuston · 2024-11-06T23:59:37Z

Yeah the write path is the least resilient part of the library right now.

Yes, it looks unstable especially in a busy hdfs cluster.

If you have any improvement patch, i’m happy to test this. Almostly, i could reproduce it by my cases.

Kimahriman · 2024-11-07T00:31:25Z

Thanks for your quick reply, I found another error status when appending file. Please see this:

This would be when a data node the block is replicating to fails. The connection drop would be when the one you are talking to dies, so I think they're effectively the same issue

zuston · 2024-11-07T14:19:56Z

After simply looking this article https://blog.cloudera.com/understanding-hdfs-recovery-processes-part-2/, I think the bad data node replacement will not trigger the original data resend. Once client found the bad data node, it will build the new pipeline through the namenode, and then resend the data start from last acked packet. @Kimahriman

Kimahriman · 2024-11-07T15:32:08Z

Very helpful article! I hoped that might be the case for node replacement but then found https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java#L1550 while tracing through the code. The data is re-replicated by the client on node replacement. It can only keep going from where it left off if the node isn't replaced.

zuston · 2024-11-07T23:52:41Z

Thanks for your code reference. Let me take a deep look.

zuston · 2024-11-08T06:05:34Z

The data is re-replicated by the client on node replacement.

From digging the DataStreamer.java, the replicated transfer is from the healthy datanode -> replacement datanodes. The client only trigger the recovery pipeline creating to back fill the data from healthy node to replacement node.

BTW, if ignore the bad nodes and do nothing, the missing blocks will exist. So If we want to have a resilient writing process, the datanode replacement may need to be supported

zuston · 2024-11-08T10:06:55Z

Attach the hadoop code reference: https://github.com/apache/hadoop/blob/96572764921706b1fecaf064490457d36d73ea6e/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java#L3593

Kimahriman · 2024-11-08T12:26:44Z

From digging the DataStreamer.java, the replicated transfer is from the healthy datanode -> replacement datanodes. The client only trigger the recovery pipeline creating to back fill the data from healthy node to replacement node.

Yeah looks like it just makes an RPC call to trigger the replication, just a little bit of extra complexity on there. I think that would still be follow on work to just getting the write to continue with fewer DataNodes on failures.

BTW, if ignore the bad nodes and do nothing, the missing blocks will exist. So If we want to have a resilient writing process, the datanode replacement may need to be supported

My understanding is even if a single node (or multiple but not all) in a pipeline are lost, if you successfully write the block to at least one DataNode, HDFS will re-replicate it after the fact as it will see it is under replicated. I think the main reason for replicating as part of the pipeline recovery is just to make that a little more resilient. i.e. if in the end all but one DataNodes fail in your pipeline, and you "successfully" finish your write, but then that one DataNode immediately dies, your write was successful but now you are missing that block.

zuston · 2024-11-08T14:52:50Z

My understanding is even if a single node (or multiple but not all) in a pipeline are lost, if you successfully write the block to at least one DataNode, HDFS will re-replicate it after the fact as it will see it is under replicated.

Got it. If so, the simply ignore bad data node in the pipeline is acceptable. We only just need to recognize the bad node from the ack response’s reply and flag and then ignore this.

zuston · 2024-11-13T02:41:23Z

another possible error message

zuston · 2024-11-13T05:44:43Z

Kimahriman · 2024-11-13T15:50:45Z

Hmm that's very odd, might be unrelated to writing data. What version of HDFS are you running? Also if you can replicate those having some debug logs would be helpful to see what's going on.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data pipeline failed without retry #154

Data pipeline failed without retry #154

zuston commented Nov 6, 2024

zuston commented Nov 6, 2024

Kimahriman commented Nov 6, 2024

Kimahriman commented Nov 6, 2024

zuston commented Nov 6, 2024 •

edited

Loading

zuston commented Nov 6, 2024 •

edited

Loading

Kimahriman commented Nov 7, 2024

zuston commented Nov 7, 2024 •

edited

Loading

Kimahriman commented Nov 7, 2024

zuston commented Nov 7, 2024

zuston commented Nov 8, 2024 •

edited

Loading

zuston commented Nov 8, 2024

Kimahriman commented Nov 8, 2024

zuston commented Nov 8, 2024 •

edited

Loading

zuston commented Nov 13, 2024

zuston commented Nov 13, 2024

Kimahriman commented Nov 13, 2024

Data pipeline failed without retry #154

Data pipeline failed without retry #154

Comments

zuston commented Nov 6, 2024

zuston commented Nov 6, 2024

Kimahriman commented Nov 6, 2024

Kimahriman commented Nov 6, 2024

zuston commented Nov 6, 2024 • edited Loading

zuston commented Nov 6, 2024 • edited Loading

Kimahriman commented Nov 7, 2024

zuston commented Nov 7, 2024 • edited Loading

Kimahriman commented Nov 7, 2024

zuston commented Nov 7, 2024

zuston commented Nov 8, 2024 • edited Loading

zuston commented Nov 8, 2024

Kimahriman commented Nov 8, 2024

zuston commented Nov 8, 2024 • edited Loading

zuston commented Nov 13, 2024

zuston commented Nov 13, 2024

Kimahriman commented Nov 13, 2024

zuston commented Nov 6, 2024 •

edited

Loading

zuston commented Nov 6, 2024 •

edited

Loading

zuston commented Nov 7, 2024 •

edited

Loading

zuston commented Nov 8, 2024 •

edited

Loading

zuston commented Nov 8, 2024 •

edited

Loading