TE - fix propagate metadata for fp8_autocast in from_trace
#1021
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes - #1000
Smaller Repro (without CUDA graph)-
Error
Problem -
On trace for the forward, we set
_include_te_fp8_autocast
on the trace object which is then used to wrap the python representation of the trace withfp8_autocast
(this is required for the TELinear to actually do the computation in FP8).lightning-thunder/thunder/core/trace.py
Lines 380 to 390 in a59b4ef
lightning-thunder/thunder/executors/torch_autograd.py
Lines 303 to 306 in a59b4ef
However,
from_trace
doesn't propagate this metadata. So, if we have an additional transform in post optimisation, which usesfrom_trace
, it will not set this. So we will have TELinear without afp8_autocast
.Solution -
Update
from_trace
to propagate this metadata.Test -
Tested locally with newly added test. TE tests don't run on CI. However, we do run it on nightlies.
NOTE - TE Linear is only enabled for training run (i.e. when inputs have requires_grad=True).
Thanks @mattteochen for pointing that he had seen the same error when
fp8_autocast
was missing.