Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prims.copy_ with NVFuser: sometimes it has to be a kernel. #791

Closed
nikitaved opened this issue Jul 17, 2024 · 7 comments · Fixed by #788 or #806
Closed

prims.copy_ with NVFuser: sometimes it has to be a kernel. #791

nikitaved opened this issue Jul 17, 2024 · 7 comments · Fixed by #788 or #806
Assignees
Labels
bug Something isn't working in-place nvfuser

Comments

@nikitaved
Copy link
Contributor

nikitaved commented Jul 17, 2024

Have a look at the impl of copy_ in the nvfuserex:

def copy_(
copy_from: TensorProxy,
copy_to: TensorProxy,
*,
fd: FusionDefinition,
lc_to_nv_map: dict,
) -> Any:
nvcopy_from = getnv(copy_from, fd, lc_to_nv_map)
nvcopy_to = getnv(copy_to, fd, lc_to_nv_map)
fd.add_output(nvcopy_from, alias_input=nvcopy_to)
return nvcopy_to

So, we can see that, apparently, no kernels are being actually launched, unless an op and a copy op are fused together. In the context of in-place ops, take a look at our torch/__init__.py in-place ops and note that they are represented in the nvfuserex.

Sometimes we might need an in-place op which does not have an equivalent in NVFuser.
This might lead to silent no-op issues like #789, where copy_ ends up being the only op in the fusion and, as a result, does nothing.

cc @tfogal

@nikitaved
Copy link
Contributor Author

nikitaved commented Jul 17, 2024

Most likely it is possible to fix that with a tad more sophisticated checker that test whether copy_ is the only op in the FusionDefinition, or something similar along these lines, or we just do not fuse at all...

@nikitaved nikitaved self-assigned this Jul 17, 2024
@tfogal
Copy link
Collaborator

tfogal commented Jul 17, 2024

cc @crcrpar @jjsjann123

@nikitaved
Copy link
Contributor Author

nikitaved commented Jul 17, 2024

Potential fix in #788, albeit it might be temporary unless we keep prims.copy_ as a helper symbol.

@t-vi t-vi closed this as completed in #788 Jul 17, 2024
@jjsjann123
Copy link
Collaborator

sorry for jumping in late.... But this looks like an nvfuser issue. i.e. if we are copying from a fusion input, we should have called a set on the copy_from.

Actually, we could have always called a set on copy_from. But looks like even with that, nvfuser isn't running a kernel on that and that's wrong we need to patch that.

@jjsjann123 jjsjann123 reopened this Jul 18, 2024
@jjsjann123 jjsjann123 assigned jjsjann123 and unassigned nikitaved Jul 18, 2024
@jjsjann123
Copy link
Collaborator

Thanks a lot for the repro and quick patch @nikitaved . 🙇

I'll follow up and fix nvfuser side properly.

@jjsjann123
Copy link
Collaborator

A toy repro for myself

import thunder
import torch

def foo(x, y, idx, src):
    x.index_copy_(0, idx, src)
    z = x + 1
    y.index_copy_(0, idx, src)
    return x, y, z
    #return x, y
 
jfoo = thunder.jit(foo)
 
x = torch.randn(5, 5, device="cuda")
y = torch.randn(5, 5, device="cuda")
idx = torch.arange(5).cuda()
src = torch.ones(5, 5, device="cuda")
 
xx = x.clone()
yy = y.clone()
foo(x, y, idx, src)
jfoo(xx, yy, idx, src)
 
print(thunder.last_traces(jfoo)[-1])
assert(x.allclose(xx))
assert(y.allclose(yy))

@jjsjann123
Copy link
Collaborator

nvfuser behavior is being patched in NVIDIA/Fuser#2638

Meanwhile, I'm reverting the thunder logic in: #806

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment