TransformerEngine + cudagraphs #1000

mattteochen · 2024-08-20T07:44:57Z

🐛 Bug

Compiling a model with Transformer Engine executor with Cudagraphs enabled is not supported

To Reproduce

Code sample

import torch
import thunder

class Module(torch.nn.Module):
    def __init__(self, in_features, out_features) -> None:
        super().__init__()
        self.linear =  torch.nn.Linear(in_features, out_features)

    def forward(self, x: torch.Tensor):
        return self.linear(x)

with torch.device('cuda'):
    m = 1
    in_features = 4096 * m
    out_features = 4096 * m
    model = Module(in_features, out_features)
    x = torch.randn(768, in_features, requires_grad=True)

    jmodel_def = thunder.jit(model, executors=['transformer_engine'], use_cudagraphs=True)

    y = jmodel_def(x)

Expected behaviour

Traceback (most recent call last):
  File "/workspace/workdir/examples/dev/te.py", line 32, in <module>
    y = jmodel_def(x)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/workdir/thunder/core/module.py", line 63, in forward
    res = self._forward_fn(*args, **kwargs)
  File "/workspace/workdir/thunder/__init__.py", line 781, in fn_
    result = cache_entry.computation_fn(*inps)
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast
    return func(*args, **kwargs)
  File "thunder.augmented_forward_fn_3", line 12, in augmented_forward_fn
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
  File "/workspace/workdir/thunder/executors/transformer_engineex.py", line 212, in forward
    weight_fp8, weight_t_fp8 = self.get_fp8_weight_version_compat(
  File "/workspace/workdir/thunder/executors/transformer_engineex.py", line 293, in get_fp8_weight_version_compat
    weight_fp8 = self.get_fp8_workspace(
  File "/usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/module/base.py", line 965, in get_fp8_workspace
    out.cast_transpose_(
  File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/float8_tensor.py", line 732, in cast_transpose_
    fp8_meta = self._fp8_meta[fp8_meta_key]
KeyError: 'scaling_fwd'

Environment

PyTorch Version (e.g., 1.0): 2.5.0a0+gitb0fc6aa
Thunder: f9dbf9c
OS (e.g., Linux): Linux
Python version: 3.10.12
CUDA/cuDNN version: 12.6
GPU models and configuration: RTX ADA 6000
Any other relevant information: Tested on NVIDIA internal docker containers

The text was updated successfully, but these errors were encountered:

mattteochen · 2024-08-21T15:35:55Z

I remember that I had this KeyError: 'scaling_fwd' error once when running a trace TraceCtx the decorator for transformer engine was not present (@transformer_engine.fp8_autocast(fp8_recipe=te_fp8_recipe)).

This info may help.

kshitij12345 · 2024-08-26T13:01:46Z

Fixed in #1021

t-vi added cudagraphs program-coverage Requests for model and program coverage labels Aug 20, 2024

kshitij12345 added the TransformerEngine label Aug 21, 2024

kshitij12345 self-assigned this Aug 21, 2024

kshitij12345 mentioned this issue Aug 22, 2024

TE - fix propagate metadata for fp8_autocast in from_trace #1021

Merged

tfogal changed the title ~~TE + cudagraphs~~ TransformerEngine + cudagraphs Aug 23, 2024

tfogal added the thunderfx for things that could be applicable to the dynamo+thunder frontend label Aug 23, 2024

kshitij12345 closed this as completed Aug 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TransformerEngine + cudagraphs #1000

TransformerEngine + cudagraphs #1000

mattteochen commented Aug 20, 2024 •

edited

Loading

mattteochen commented Aug 21, 2024

kshitij12345 commented Aug 26, 2024

TransformerEngine + cudagraphs #1000

TransformerEngine + cudagraphs #1000

Comments

mattteochen commented Aug 20, 2024 • edited Loading

🐛 Bug

To Reproduce

Code sample

Expected behaviour

Environment

mattteochen commented Aug 21, 2024

kshitij12345 commented Aug 26, 2024

mattteochen commented Aug 20, 2024 •

edited

Loading