-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
torch.transpose
seems to be mapped to different ops depending on requires_grad
#1487
Comments
I think what's happening here is that there's no grad rule for lightning-thunder/thunder/core/transforms.py Line 947 in ea0159c
And then
Because One way to address this oddity might be to add a comment with the provenance of the final operation. Another idea, which we've discussed for awhile, would be to add more custom grad rules for torch operations, so that, for example, fyi @IvanYashchuk; maybe @beverlylytle would be interested in identifying speed or memory differences between the autograds that Thunder generates and PyTorch eager's (and torch.compiles?)? Maybe instead of identifying the differences it's more interesting to just add some more autograd formulas. An automated mechanism to identify speed and memory differences would probably help us identify the most important operations to cover, however, and make it easy to talk about the improvement after a custom formula was added. It would also be important to verify that the custom handwritten formula actually had better performance, too. Edit: Forgot to mention, if @beverlylytle and @IvanYashchuk decide to prioritize this, then I'm happy to talk about autograd concepts and autograd in Thunder and how we might systematically measure performance. |
We have a simple way to measure speed and memory differences using pytest-benchmark (speed is printed in the terminal output, memory is not but it's saved in the benchmark results file). Here's an example of
The challenge is determining the appropriate input shape for comparison. This is easy for elementwise operations but more difficult for operations like scaled dot product attention. When we have concrete shapes to analyze (for example from logs of ThunderFX recorded in SubgraphInfo ) then we can create benchmarks per each PyTorch operation from the input graph similar to per graph benchmarking in lightning-thunder/thunder/dynamo/compiler_graph_benchmark.py Lines 33 to 76 in fef423b
|
Note: If you have a model or program that is not supported yet but should be, please use the program coverage template.
🐛 Bug
I admit I'm not sure if this is a bug of an expected behavior, but
torch.transpose
is mapped totorch.transpose
ifrequires_grad=False
,torch.permute
otherwise.To Reproduce
Code sample
output
Expected behavior
Environment
conda
,pip
, source):Additional context
The text was updated successfully, but these errors were encountered: