Checklist / 检查清单
Bug Description / Bug 描述
Version
4.0.0.dev0
Describe the bug
When --predict_with_generate true is enabled in a multi-GPU (DDP) environment, the eval_prediction.predictions passed to a custom EvalMetrics subclass collapses from a 2D tensor (BatchSize, SeqLen) into a flattened 1D array.
This causes Serializer.from_tensor(preds) to fail with _pickle.UnpicklingError or IndexError because the data structure is corrupted during the gathering process.
Root Cause
In Seq2SeqTrainer.prediction_step, pad_sequence only performs local padding within each GPU. If GPU 0 and GPU 1 have different maximum sequence lengths for their local batches, accelerator.gather cannot stack them and flattens the tensors instead.
How to Reproduce / 如何复现
Run SFT with --predict_with_generate true on 2+ GPUs.
Use a custom EvalMetrics that accesses eval_prediction.predictions.
Observe that preds.ndim is 1 and the evaluation crashes.
Additional Information / 补充信息
No response