-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
try removing ltorch.copy_
#1209
Changes from 4 commits
f574939
b952e2a
c25a908
f0eac73
82dc7a7
b0458bd
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,6 +5,7 @@ | |
from torch.testing import assert_close, make_tensor | ||
|
||
import thunder | ||
from thunder.core import prims | ||
import thunder.core.dtypes as datatypes | ||
import thunder.torch as ttorch | ||
from thunder.tests.framework import instantiate, nvFuserExecutor | ||
|
@@ -158,9 +159,9 @@ def func3(x, y): | |
def test_inplace_copy_dst_copy_returned_issue_1109(executor, device, dtype): | ||
def func(T0): | ||
T1 = torch.sin(T0) | ||
T0.copy_(T1) # destination.copy_(source) | ||
prims.copy_(T1, T0) | ||
crcrpar marked this conversation as resolved.
Show resolved
Hide resolved
|
||
T2 = torch.cos(T1) | ||
T0.copy_(T2) | ||
prims.copy_(T2, T0) | ||
# T1 & T2 should be returned as separate buffer, instead of sharing | ||
# storage with T0 | ||
return T1, T2 | ||
|
@@ -169,12 +170,21 @@ def func(T0): | |
# This pattern is unsafe in general. Disabling sanity check to silence | ||
# exception for testing | ||
traced_foo = executor.make_callable(func, disable_inplace_copy_check=True) | ||
a = make_tensor((4, 4), device=device, dtype=tdtype) | ||
a_ref = a.clone() | ||
|
||
o_thunder = traced_foo(a) | ||
o_eager = func(a_ref) | ||
|
||
assert_close(a_ref, a) | ||
for o, o_ref in zip(o_thunder, o_eager): | ||
assert_close(o, o_ref) | ||
t0 = make_tensor((4, 4), device=device, dtype=tdtype) | ||
t0_ref = t0.clone() | ||
|
||
actual_t1, actual_t2 = traced_foo(t0) | ||
|
||
expected = t0_ref.sin().cos() | ||
expected_t1 = t0_ref.sin() | ||
expected_t2 = expected_t1.cos() | ||
expected = expected_t2 | ||
|
||
assert_close(t0, expected) | ||
assert_close(actual_t2, expected_t2) | ||
# FIXME(crcrpar): Since there's no `ltorch.Tensor.copy_`, functions like `func` would not | ||
# be observed and executed with pytorch eager mode. Though there should be either an audit of | ||
# `prims.copy_` in a nvfuser region and/or what #1110 did. | ||
assert actual_t1.data_ptr() == actual_t2.data_ptr() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for putting a FIXME there. but this is the case that nvfuser is producing wrong result in one of the output will still be aliasing the input, and that's not what the program should be translated. i.e. I don't think an audit that removes redundant There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For the record, I'm all in for having a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If a callable has multiple in-place ops, then the redundant copies would be cleaned up. We just don't have an equivalent for multiple raw |
||
with pytest.raises(AssertionError): | ||
assert_close(actual_t1, expected_t1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we supposed to do this? I'm worried that this will give us wrong program.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is composed with the revert of #1209 and the test update.
I thought
fd.ops.set
is something ideally we want to avoid as per #1173There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry about not being more clear in #1173
We still wanted to have the set for
copy_
. I think the root-cause with issue #1173 is that we are returningnvcopy_from
in the fusion region.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why isn't there a copy op?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a copy op, the difference is, whether we are returning a different buffer, or the aliased source.
I think the example below would help.
The thunder program below behaves differently, depends on what's the return value.
They are translated as nvfuser programs correspondingly.
The part of
copy_
is translated toWhen we return T2 as output of the fusion, since T2 is marked as alias to T0 (sharing storage), we are not returning an extra buffer here.
However, the story changes when we return T1 from the fusion. Since
T1
is a new buffer in the program. You can play with this using vvv:You'll notice that without the extra
fd.ops.set
incopy_
, nvfuser won't be able to faithfully represent the intended thunder behavior.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, then I don't think this PR deserves a merge.
f0eac73: compile 54s, 11.07ms
82dc7a7: compile 97s, 14.28ms