CUDAGraphs in Thunder #981
Labels
cudagraphs
design
This is a largish feature / design
enhancement
New feature or request
thunderfx
for things that could be applicable to the dynamo+thunder frontend
PR #977 proposes to make CUDAGraphs a Transform with the cudagraphs being created in the
post_optimization
phase of creating the executable. This in particular enables people to subclass or copy the CUDAGraphs transform to have their own, bespoke thing.We do have some ideas (and this is just a list from the top of my head to get things started), but this issue is to collect discussion and ideas about how you want CUDAGraphs to look like.
static input detection
Currently we auto-mark parameters, but might be able to do more, e.g. buffers.
If we knew things about the things between fusions, we might know if we can re-use input buffers.
caching
In particular we would not want to use a global cache, but a per-transform one, to let things go out of scope.
Also we want to enable users to potentially override the caching (we found that it can be interesting to re-use cudagraph fusions in case we have some non-grapheable bit in a repeated block of a model).
A step here is PR #1001
operator selection for what is cuda-graphable
I think it would be cool if symbols in general had a tag whether they are suitable for CUDAGraph inclusion.
After #977 we have the advantage that if people have particular ideas about this, they can just subclass and override can_fuse.
check composability
Test eg. fsdp + cudagraph
I'm guessing the communication prims don't mix well with cudagraphs? but that might be with operator tagging...
(not immediate) future: memory information about ops
CUDAGraphs could enormously benefit by more information about the memory effects of operators (so similar to meta info but about what allocations a function makes, where the output is allocated etc.), but that is in the future.
Obiviously other people @nikitaved @IvanYashchuk @tfogal and more will be much more knowledgeable, but so here is a start.
The text was updated successfully, but these errors were encountered: