Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sanity check for inplace copy #285

Merged
merged 13 commits into from
May 16, 2024
Merged

Add sanity check for inplace copy #285

merged 13 commits into from
May 16, 2024

Conversation

kiya00
Copy link
Collaborator

@kiya00 kiya00 commented Apr 26, 2024

Before submitting
  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?

What does this PR do?

Fixes (#265).
Because it's not safe to use nvfuser's inplace update API. A check is added to ensure that the copy_to argument (and output) of prims.copy_ is not used as input for any of its subsequent operators in a nvFusion.

cc @apaz-cli

thunder/tests/test_inplace_copy.py Outdated Show resolved Hide resolved
thunder/core/transform_common.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@jjsjann123 jjsjann123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting the effort into writing the check, it's great to have those for a codegen block!

I'm a bit unclear about our expectation from this check function though.
Feels almost like we are hoping that running this check globally would alway ensure that we have a correct function. But I don't think it's sufficient.

A user script could be using an argument after it has been inplace updated (see example below). That's a legit program, our check will reject it. A false negative check is not ideal but I think it's acceptable.

def fn(a, b):
  c = a + 1
  d = copy_(c, a)
  e = b + a
  return e

The worse scenario is that, our optimization/transformation could re-order things. For the program above, e = b+a doesn't depend on anything and we could move it before the inplace copy. Which alters the semantics of the function, but it would appear as legit program in our check.
False positive check and silent wrong result is bad.

thunder/core/transform_common.py Outdated Show resolved Hide resolved
thunder/core/transform_common.py Outdated Show resolved Hide resolved
thunder/tests/test_inplace_copy.py Show resolved Hide resolved
Copy link
Collaborator

@jjsjann123 jjsjann123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I'm not blocking this PR. It's an improvement to what we have.

thunder/__init__.py Outdated Show resolved Hide resolved
thunder/core/transform_common.py Outdated Show resolved Hide resolved
thunder/core/transform_common.py Outdated Show resolved Hide resolved
@kiya00 kiya00 marked this pull request as draft May 8, 2024 19:38
@kiya00
Copy link
Collaborator Author

kiya00 commented May 8, 2024

False positive check and silent wrong result is bad.

yes, I met this problem too

def func1(x, y):
        z = x * y
        o = thunder.core.prims.copy_(z, x)
        return x + y

generate the trace:

@torch.no_grad()
@no_autocast
def computation(x, y):
  # x: "cuda:0 f32[4, 4]"
  # y: "cuda:0 f32[4, 4]"
  [t2] = nvFusion0(x, y)
    # t2 = prims.add(x, y)  # t2: "cuda:0 f32[4, 4]"
    # result = prims.mul(x, y)  # result: "cuda:0 f32[4, 4]"
    # prims.copy_(result, x)
  del x, y
  return t2

which is wrong result.
I think this sanity check is for checking the final execution trace, not the original function. The reorder problem should be solved by the general inplace and aliasing plan(maybe?) cc: @IvanYashchuk

@kiya00
Copy link
Collaborator Author

kiya00 commented May 8, 2024

BTW, I'm not blocking this PR. It's an improvement to what we have.

Not at all, some of the logic is not right, thanks for pointing out @jjsjann123 , I convert it to draft and I'll fix it on Monday :)

@jjsjann123
Copy link
Collaborator

False positive check and silent wrong result is bad.

yes, I met this problem too

def func1(x, y):
        z = x * y
        o = thunder.core.prims.copy_(z, x)
        return x + y

generate the trace:

@torch.no_grad()
@no_autocast
def computation(x, y):
  # x: "cuda:0 f32[4, 4]"
  # y: "cuda:0 f32[4, 4]"
  [t2] = nvFusion0(x, y)
    # t2 = prims.add(x, y)  # t2: "cuda:0 f32[4, 4]"
    # result = prims.mul(x, y)  # result: "cuda:0 f32[4, 4]"
    # prims.copy_(result, x)
  del x, y
  return t2

which is wrong result. I think this sanity check is for checking the final execution trace, not the original function. The reorder problem should be solved by the general inplace and aliasing plan(maybe?) cc: @IvanYashchuk

Got'ya. Sounds like our understanding is aligned and that's all that I wanted to confirm. Feel free to ping me when it's ready for another round of review.

@kiya00
Copy link
Collaborator Author

kiya00 commented May 13, 2024

followed the comments, the modifications are:

  • check the sharp edge for each nvFusion

  • use the utils.consumers and check the index for the relative position

  • added a disable_inplace_copy_check in the thunder.jit

@jjsjann123 @IvanYashchuk , could you take a look again

Copy link
Collaborator

@jjsjann123 jjsjann123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for updating this!

thunder/__init__.py Outdated Show resolved Hide resolved
thunder/tests/test_inplace_copy.py Outdated Show resolved Hide resolved
thunder/core/transform_common.py Outdated Show resolved Hide resolved
thunder/core/transform_common.py Outdated Show resolved Hide resolved
@kiya00
Copy link
Collaborator Author

kiya00 commented May 16, 2024

Hi @t-vi @mruberry , this PR is a simple sanity check on the final trace for nvfuser copy_, could you take a look?

Copy link
Collaborator

@t-vi t-vi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@t-vi t-vi enabled auto-merge (squash) May 16, 2024 15:19
@t-vi t-vi merged commit 332dcc1 into main May 16, 2024
37 checks passed
@t-vi t-vi deleted the inplace_check branch May 16, 2024 15:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants