Non-DP runs default to float32 precision #630

carmocca · 2024-10-18T18:26:25Z

The training script relies on FSDP's MixedPrecisionPolicy to take care of dtypes.

But when data-parallelism is not used (for example when running in a single node with TP 8) then this does not happen and training runs in float32.

This is a bit unintuitive especially when comparing against runs with DP enabled.
If I'm not mistaken, the default training script does not even call torch.set_float32_matmul_precision() so it's currently missing out on speedups.

Do you agree that this should be changed? Thanks!

The text was updated successfully, but these errors were encountered:

tianyu-l · 2024-10-18T19:21:45Z

You are right that currently mixed precision is only supported in FSDP.
For general support by AMP, I think it's being worked on in #591

tianyu-l added the enhancement New feature or request label Oct 18, 2024

tianyu-l linked a pull request Nov 22, 2024 that will close this issue

fix mixed precision for replicate / pure DDP #591

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-DP runs default to float32 precision #630

Non-DP runs default to float32 precision #630

carmocca commented Oct 18, 2024

tianyu-l commented Oct 18, 2024

Non-DP runs default to float32 precision #630

Non-DP runs default to float32 precision #630

Comments

carmocca commented Oct 18, 2024

tianyu-l commented Oct 18, 2024