You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to trace torch.nn.MultiheadAttention with thunder and I'm hitting a AttributeError: The torch language context has no method or attribute is_nested error. Taking a closer look, it's coming from https://github.com/pytorch/pytorch/blob/main/torch/nn/modules/activation.py#L1236-L1238, in which mha needs to check that the input tensors are not nested. If I comment out these lines, the rest of mha actually works fine, and thunder is able to trace and run both fwd_only and with bwd.
Minimal repro:
import torch
import thunder
model = torch.nn.MultiheadAttention(128, 16, device='cuda')
q = torch.randn(100, 128, device='cuda')
k = torch.randn(100, 128, device='cuda')
v = torch.randn(100, 128, device='cuda')
jfunc = thunder.jit(model)
jfunc(q, k, v)
The text was updated successfully, but these errors were encountered:
I'm trying to trace torch.nn.MultiheadAttention with thunder and I'm hitting a
AttributeError: The torch language context has no method or attribute is_nested
error. Taking a closer look, it's coming from https://github.com/pytorch/pytorch/blob/main/torch/nn/modules/activation.py#L1236-L1238, in which mha needs to check that the input tensors are not nested. If I comment out these lines, the rest of mha actually works fine, and thunder is able to trace and run both fwd_only and with bwd.Minimal repro:
The text was updated successfully, but these errors were encountered: