`prims.copy_` with NVFuser: sometimes it has to be a kernel. #791

nikitaved · 2024-07-17T12:36:40Z

Have a look at the impl of copy_ in the nvfuserex:

lightning-thunder/thunder/executors/nvfuserex_impl.py

Lines 2036 to 2046 in 29379ec

    
           def copy_( 
        
               copy_from: TensorProxy, 
        
               copy_to: TensorProxy, 
        
               *, 
        
               fd: FusionDefinition, 
        
               lc_to_nv_map: dict, 
        
           ) -> Any: 
        
               nvcopy_from = getnv(copy_from, fd, lc_to_nv_map) 
        
               nvcopy_to = getnv(copy_to, fd, lc_to_nv_map) 
        
               fd.add_output(nvcopy_from, alias_input=nvcopy_to) 
        
               return nvcopy_to

So, we can see that, apparently, no kernels are being actually launched, unless an op and a copy op are fused together. In the context of in-place ops, take a look at our torch/__init__.py in-place ops and note that they are represented in the nvfuserex.

Sometimes we might need an in-place op which does not have an equivalent in NVFuser.
This might lead to silent no-op issues like #789, where copy_ ends up being the only op in the fusion and, as a result, does nothing.

cc @tfogal

The text was updated successfully, but these errors were encountered:

nikitaved · 2024-07-17T12:58:08Z

Most likely it is possible to fix that with a tad more sophisticated checker that test whether copy_ is the only op in the FusionDefinition, or something similar along these lines, or we just do not fuse at all...

tfogal · 2024-07-17T16:24:46Z

cc @crcrpar @jjsjann123

nikitaved · 2024-07-17T16:28:01Z

Potential fix in #788, albeit it might be temporary unless we keep prims.copy_ as a helper symbol.

jjsjann123 · 2024-07-18T20:57:11Z

sorry for jumping in late.... But this looks like an nvfuser issue. i.e. if we are copying from a fusion input, we should have called a set on the copy_from.

Actually, we could have always called a set on copy_from. But looks like even with that, nvfuser isn't running a kernel on that and that's wrong we need to patch that.

jjsjann123 · 2024-07-18T20:57:49Z

Thanks a lot for the repro and quick patch @nikitaved . 🙇

I'll follow up and fix nvfuser side properly.

jjsjann123 · 2024-07-18T21:24:01Z

A toy repro for myself

import thunder
import torch

def foo(x, y, idx, src):
    x.index_copy_(0, idx, src)
    z = x + 1
    y.index_copy_(0, idx, src)
    return x, y, z
    #return x, y
 
jfoo = thunder.jit(foo)
 
x = torch.randn(5, 5, device="cuda")
y = torch.randn(5, 5, device="cuda")
idx = torch.arange(5).cuda()
src = torch.ones(5, 5, device="cuda")
 
xx = x.clone()
yy = y.clone()
foo(x, y, idx, src)
jfoo(xx, yy, idx, src)
 
print(thunder.last_traces(jfoo)[-1])
assert(x.allclose(xx))
assert(y.allclose(yy))

jjsjann123 · 2024-07-19T00:02:49Z

nvfuser behavior is being patched in NVIDIA/Fuser#2638

Meanwhile, I'm reverting the thunder logic in: #806

nikitaved added nvfuser in-place bug Something isn't working labels Jul 17, 2024

nikitaved mentioned this issue Jul 17, 2024

remove OverridenKVCache and fix some peculiar cases of prims.copy_ + NVFuser #788

Merged

nikitaved self-assigned this Jul 17, 2024

t-vi closed this as completed in #788 Jul 17, 2024

jjsjann123 reopened this Jul 18, 2024

jjsjann123 assigned jjsjann123 and unassigned nikitaved Jul 18, 2024

jjsjann123 mentioned this issue Jul 19, 2024

reverting special handling of copy_ in nvfuser executor #806

Merged

t-vi closed this as completed in #806 Jul 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`prims.copy_` with NVFuser: sometimes it has to be a kernel. #791

`prims.copy_` with NVFuser: sometimes it has to be a kernel. #791

nikitaved commented Jul 17, 2024 •

edited

Loading

nikitaved commented Jul 17, 2024 •

edited

Loading

tfogal commented Jul 17, 2024

nikitaved commented Jul 17, 2024 •

edited

Loading

jjsjann123 commented Jul 18, 2024

jjsjann123 commented Jul 18, 2024

jjsjann123 commented Jul 18, 2024

jjsjann123 commented Jul 19, 2024

prims.copy_ with NVFuser: sometimes it has to be a kernel. #791

prims.copy_ with NVFuser: sometimes it has to be a kernel. #791

Comments

nikitaved commented Jul 17, 2024 • edited Loading

nikitaved commented Jul 17, 2024 • edited Loading

tfogal commented Jul 17, 2024

nikitaved commented Jul 17, 2024 • edited Loading

jjsjann123 commented Jul 18, 2024

jjsjann123 commented Jul 18, 2024

jjsjann123 commented Jul 18, 2024

jjsjann123 commented Jul 19, 2024

`prims.copy_` with NVFuser: sometimes it has to be a kernel. #791

`prims.copy_` with NVFuser: sometimes it has to be a kernel. #791

nikitaved commented Jul 17, 2024 •

edited

Loading

nikitaved commented Jul 17, 2024 •

edited

Loading

nikitaved commented Jul 17, 2024 •

edited

Loading