While reproducing TwinFlow training on Z-Image, we found that when the transformer loads the original weights , the loss becomes NaN. After debugging, we traced the issue to the forward function in transformer_z_image.py where the NaNs are introduced by the variables at lines 95–96: t_emb = t_emb + \ t_emb_2 * delta_t_abs.unsqueeze(1)
