Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Max pool #163

Merged
merged 23 commits into from
Apr 12, 2024
Merged

Max pool #163

merged 23 commits into from
Apr 12, 2024

Conversation

jjsjann123
Copy link
Collaborator

@jjsjann123 jjsjann123 commented Apr 11, 2024

What does this PR do?

Fixes #164.

We have restored thunder performance by having torchex running max_pool2d/3d via a single aten call, versus using the decomposed primitive operations using convolution.

A quick performance is demonstrated here:
This is before the PR:

jit_model elapsed time:  0.015625953674316406
torch eager elapsed time:  0.0018506050109863281

After the PR:

jit_model elapsed time:  0.0022668838500976562
torch eager elapsed time:  0.001873016357421875

Note this is only done for max_pool2d/3d. Because max_pool1d is implicitly differentiable in pytorch so there's no backward entry in aten.

@jjsjann123 jjsjann123 marked this pull request as ready for review April 12, 2024 05:35
@jjsjann123
Copy link
Collaborator Author

I'm just changing the executor implementation. Hence I don't think I need to add extra tests apart from what's already in CI.

thunder/executors/torchex.py Outdated Show resolved Hide resolved
thunder/executors/torchex.py Outdated Show resolved Hide resolved
thunder/executors/torchex.py Outdated Show resolved Hide resolved
thunder/executors/torchex.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@tfogal tfogal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having an explicit op instead of decomposing feels reasonable.

Do we want the subsymbols of the poolXd to be the original verbose decomposition? My gut is actually 'no', i.e. that just about every backend would explicitly implement a pooling operator anyway. But wanted to throw it out there.

In general I'd recommend more """doc comments""" on functions but I'm not going to hold off on a +1 over that. A comment I'd like to see somewhere is something to the effect of "we tried decomposing this as conv + X + Y + Z, but it leads to really bad perf and systems like nvFuser implement pooling directly anyway", i.e. explaining why this op exists and that the alternative isn't great.

thunder/executors/torchex.py Outdated Show resolved Hide resolved
thunder/executors/torchex.py Show resolved Hide resolved
thunder/executors/torchex.py Show resolved Hide resolved
@jjsjann123 jjsjann123 added the visionmodel issues related to supporting vision model label Apr 12, 2024
@t-vi t-vi enabled auto-merge (squash) April 12, 2024 20:25
Copy link
Collaborator

@t-vi t-vi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jjsjann123 @tfogal

@t-vi t-vi merged commit 709a062 into Lightning-AI:main Apr 12, 2024
39 checks passed
@IvanYashchuk IvanYashchuk removed their request for review April 15, 2024 09:32
@jjsjann123 jjsjann123 deleted the max_pool branch April 17, 2024 17:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
visionmodel issues related to supporting vision model
Projects
None yet
Development

Successfully merging this pull request may close these issues.

torchex running pooling without decomposition
4 participants