-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prims.copy_
with NVFuser: sometimes it has to be a kernel.
#791
Comments
Most likely it is possible to fix that with a tad more sophisticated checker that test whether |
Potential fix in #788, albeit it might be temporary unless we keep |
sorry for jumping in late.... But this looks like an nvfuser issue. i.e. if we are copying from a fusion input, we should have called a set on the copy_from. Actually, we could have always called a set on copy_from. But looks like even with that, nvfuser isn't running a kernel on that and that's wrong we need to patch that. |
Thanks a lot for the repro and quick patch @nikitaved . 🙇 I'll follow up and fix nvfuser side properly. |
A toy repro for myself
|
nvfuser behavior is being patched in NVIDIA/Fuser#2638 Meanwhile, I'm reverting the thunder logic in: #806 |
Have a look at the impl of
copy_
in the nvfuserex:lightning-thunder/thunder/executors/nvfuserex_impl.py
Lines 2036 to 2046 in 29379ec
So, we can see that, apparently, no kernels are being actually launched, unless an op and a copy op are fused together. In the context of in-place ops, take a look at our
torch/__init__.py
in-place ops and note that they are represented in the nvfuserex.Sometimes we might need an in-place op which does not have an equivalent in NVFuser.
This might lead to silent no-op issues like #789, where
copy_
ends up being the only op in the fusion and, as a result, does nothing.cc @tfogal
The text was updated successfully, but these errors were encountered: