Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add autocast for conv1/2/3d #797

Merged
merged 3 commits into from
Jul 18, 2024
Merged

add autocast for conv1/2/3d #797

merged 3 commits into from
Jul 18, 2024

Conversation

t-vi
Copy link
Collaborator

@t-vi t-vi commented Jul 18, 2024

Fixes: #796

@t-vi t-vi requested review from mruberry and lantiga as code owners July 18, 2024 06:33
Comment on lines +3881 to +3882
res = conv_function(
a,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: looks like conv_function == _convolution_autocast_impl accepts kwargs here, but _convolution_autocast_impl misses kwargs in its definition.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but this is to be able to route dtype through without burdening the conv helper to know about dtypes.

Comment on lines 193 to 195
with torch.autocast("cpu", torch.bfloat16):
eager_out = foo(x, w)
jit_out = jfoo(x, w)
Copy link
Contributor

@nikitaved nikitaved Jul 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder, will the autograd be affected? Shall we test is as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean the backward?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added it with terrible bounds...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, maybe rand vs randn could do better?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, so I would suspect that there still is some dragon lingering somewhere, because I cannot really think of much reason to have thunder + autocast return wildly different results than torch + autocast even for wild inputs, but TBH, I would postpone investigating that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but you are right, switching to rand buys me an order of magnitude, cool, thank you!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out that Thunder does funny, slow things for the conv backward.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#799 <-- filed an issue

@t-vi t-vi mentioned this pull request Jul 18, 2024
Copy link
Contributor

@nikitaved nikitaved left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you, @t-vi !

@t-vi t-vi merged commit 10a4efb into main Jul 18, 2024
36 checks passed
@t-vi t-vi deleted the tom/conv_autocast branch July 18, 2024 12:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for torch autocasting
2 participants