-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modeling of shape queries #1133
Comments
Logging offline suggestions by @mruberry:
I think this justifies #1113 . cc'ing @t-vi
I think this just requires a decent DCE to aggregate the shape query. Shape query on python is not trivial and takes ~ us.
I agree that this should be the principle of how constraints are inserted. But if provenance can be used to simplify such constraints (e.g. equivalence, non-negative), I think those should still be leveraged. |
So currently, the prologue performs exactly two things:
I wonder if it would be good to keep things this way, in particular not having the prologue compute things for the computation trace. I'm not saying it should be this way or not, but just that if we change it, it should be a very deliberate choice. |
Some computations are so closely related to "checks" that it would reduce the total CPU work to compute them in the prologue. In general I still like thinking about computing everything possible in the prologue. We have a lot of opportunities to optimize the performance of prologues. @jjsjann123 is completely correct that computing everything in the prologue does not take advantage of overlapping CPU and GPU computation, and it's interesting to see if we can do that effectively. |
I absolutely agree. The key property we need to keep here is "thinness" of Thunder. By this I mean that if the user of thunder knows they're calling the same computation 50x with controlled inputs (e.g. in the training loop), they can run the prologue once and then rely on the compute function working for them. Note that if we anticipate power users of Thunder to do this, we also save them computation by putting it in the prologue, which might be even more attractive than overlapping with GPU. |
Want to highlight the issue we see here as well. |
I think our current shape modeling is a bit too explicit. Any shape query leaves its own prim in the trace and we don't have a clean up path to handle that. Taking the example of broadcast.
The trace right after the interpreter, there's lots of duplicated
There's also another duplication happening in prologue trace
We have lifted up the prims.shape logic here
there's also these pieces doing the same thing in prologue trace
which might be slightly harder to detect and clean up. |
Let me know if you want to chat about some ideas! Splatting the shape information once could be a fix. Another option could be to update the CSE pass. I'm a little surprised it doesn't work already. |
CSE only works at the top level, so the repetitive pattern inside the lower hierarchy isn't getting cleaned up, which is an eyesore. |
Note that #1500 added |
🚀 Feature
Modeling of accessing shape attribute of Tensor/TensorProxy has raising up to discussion in separate PRs & discussions.
We are trying to debate on whether we would want to lift shape inference logic into prologue trace in Thunder in general.
i.e. For a program
If we leave all shape logic in the computation, it should be simplified as below:
One alternative is to lift all shape logic in the prologue trace, so we'll have
I think the first version where we see how the shape operation is defined in the operation would simplify generated kernel, since we do not need to resolve/validate reshape concretization.
The text was updated successfully, but these errors were encountered: