make cudagraphs a transform #977

t-vi · 2024-08-16T11:52:40Z

This introduces thunder.transform.cudagraph.CUDAGraphTransform to replace use_cudagraphs=True.

For more detailed control, users can subclass the transform and e.g. override can_fuse.

nikitaved · 2024-08-16T12:04:13Z

thunder/transforms/cudagraph.py

-                fusion_bsym: BoundSymbol = self.fuse(region, fusion_counter, num_static_inputs)
+                fusion_bsym: BoundSymbol = self.fuse(region, fusion_counter)


we are missing num_static_inputs here for a reason? If we get several non-isomorphic graphs, we should probably think about how to handle this parameter better, maybe through a callback. But that's not relevant now...

I do think that this is one of the things where the default transform is not ideal, but it is not something that was used before: the parameter was there with a default, but there was no way of providing it.
(And I am not sure that it is correct to have a nontrivial trace-global parameter for it either.)

It is used in the backward pass. We are not loosing it there, are we?

Ouch, right.

But yes, the design of this parameter is so-so as it was not expected to have had graph breaks...

but which were the args covered by it?

So, this is the removed code:

if cd.use_cudagraphs: from thunder.executors.cudagraphex import cudagraphex computation_trc = cudagraphex.fusion_pass(computation_trc) computation_traces.append(computation_trc) if backward_trc is not None: backward_trc = cudagraphex.fusion_pass(backward_trc, num_static_inputs=len(backward_trc.args[0][0])) backward_traces.append(backward_trc)

So it seems that these were the saved for backwards of (???). Is the assumption here that they were either static in the forward (parameters) or copied to the input area of the forward cuda graph?

But with the old code:

import torch, thunder with torch.device("cuda"): m = torch.nn.Linear(2, 3) inp = torch.randn(1, 2, requires_grad=True) jm = thunder.jit(m, use_cudagraphs=True) res = jm(inp) grads = torch.autograd.grad(res.sum(), (inp, *m.parameters()))

the forward has no cudagraph, so in that case, having the input as a fixed is not really correct. (admittedly, a corner case).

I guess the happy part is that we will get the parameters as static anyways and we will have to look into the buffers for our own good...

crcrpar

would there be a plan to add a test which demonstrates the composability with e.g. fsdp?

t-vi · 2024-08-16T13:23:10Z

would there be a plan to add a test which demonstrates the composability with e.g. fsdp?

That would be awesome, I really need to get the env for running distributed tests back. :(

t-vi · 2024-08-16T13:27:07Z

So as we talk about plans:

refine the static inputs a bit,
refine the caching, in particular don't use a global cache, but a per-transform one, to let things go out of scope,
refine the operator selection for what is cuda-graphable,
check composibility - I'm guessing the communication prims don't mix with cudagraphs?

To my mind, this greatly improves our starting position (similar to but smaller than Nik's work to make this a (then hardcoded) transform instead of a callable wrapper).

mruberry · 2024-08-16T15:36:33Z

This is cool; how do you see cuda graphs as a transform? If we conceptually separate the transforms like this:

pre-execution transforms
execution transform
post-execution transforms
"destructive" transforms

Where would the cuda graphs transform go? If it's a post-execution (or destructive) transform, should we think about how executors might label their operations as being cuda graphs compatible (or not)? If it's a pre-execution transform, then does it preclude using certain executors in the execution transform?

t-vi · 2024-08-16T17:29:10Z

So it is after transform for execution ("post optimization") but our experimentation (which informs some of the CUDAGraph work) caused us to go before the "del" pass.
I think it would be cool if symbols in general had a tag whether they are suitable for CUDAGraph inclusion. The current thing has the advantage that if people have particular ideas about this, they can just subclass and override can_fuse.
Also, CUDAGraphs could enormously benefit by more information about the memory effects of operators, but that is in the future.

for more information, see https://pre-commit.ci

lantiga

👍

t-vi · 2024-08-20T11:43:30Z

I have merged this after coordination with @IvanYashchuk , we will address further review comments in a follow-up.

make cudagraphs a transform

1190271

t-vi requested review from mruberry and lantiga as code owners August 16, 2024 11:52

nikitaved reviewed Aug 16, 2024

View reviewed changes

crcrpar reviewed Aug 16, 2024

View reviewed changes

mruberry requested review from IvanYashchuk and tfogal August 16, 2024 15:34

t-vi mentioned this pull request Aug 16, 2024

CUDAGraphs in Thunder #981

Open

t-vi and others added 3 commits August 19, 2024 16:05

Merge branch 'main' into tom/cudagraphs_transform

7adfca9

[pre-commit.ci] auto fixes from pre-commit.com hooks

c071a25

for more information, see https://pre-commit.ci

Merge branch 'main' into tom/cudagraphs_transform

d526304

t-vi force-pushed the tom/cudagraphs_transform branch from 0907eab to d526304 Compare August 20, 2024 08:39

t-vi mentioned this pull request Aug 20, 2024

Attach cached cudagraph callable to the transform #1001

Merged

Merge branch 'main' into tom/cudagraphs_transform

9598998

IvanYashchuk added the cudagraphs label Aug 20, 2024

lantiga approved these changes Aug 20, 2024

View reviewed changes

t-vi enabled auto-merge (squash) August 20, 2024 11:42

t-vi merged commit e45e4b4 into main Aug 20, 2024
38 checks passed

t-vi deleted the tom/cudagraphs_transform branch August 20, 2024 11:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

make cudagraphs a transform #977

make cudagraphs a transform #977

t-vi commented Aug 16, 2024

nikitaved Aug 16, 2024 •

edited

Loading

t-vi Aug 16, 2024 •

edited

Loading

nikitaved Aug 16, 2024 •

edited

Loading

t-vi Aug 16, 2024

nikitaved Aug 16, 2024 •

edited

Loading

t-vi Aug 16, 2024

nikitaved Aug 16, 2024

t-vi Aug 16, 2024

t-vi Aug 16, 2024

crcrpar left a comment

t-vi commented Aug 16, 2024

t-vi commented Aug 16, 2024

mruberry commented Aug 16, 2024

t-vi commented Aug 16, 2024

lantiga left a comment

t-vi commented Aug 20, 2024

		fusion_bsym: BoundSymbol = self.fuse(region, fusion_counter, num_static_inputs)
		fusion_bsym: BoundSymbol = self.fuse(region, fusion_counter)

make cudagraphs a transform #977

make cudagraphs a transform #977

Conversation

t-vi commented Aug 16, 2024

nikitaved Aug 16, 2024 • edited Loading

Choose a reason for hiding this comment

t-vi Aug 16, 2024 • edited Loading

Choose a reason for hiding this comment

nikitaved Aug 16, 2024 • edited Loading

Choose a reason for hiding this comment

t-vi Aug 16, 2024

Choose a reason for hiding this comment

nikitaved Aug 16, 2024 • edited Loading

Choose a reason for hiding this comment

t-vi Aug 16, 2024

Choose a reason for hiding this comment

nikitaved Aug 16, 2024

Choose a reason for hiding this comment

t-vi Aug 16, 2024

Choose a reason for hiding this comment

t-vi Aug 16, 2024

Choose a reason for hiding this comment

crcrpar left a comment

Choose a reason for hiding this comment

t-vi commented Aug 16, 2024

t-vi commented Aug 16, 2024

mruberry commented Aug 16, 2024

t-vi commented Aug 16, 2024

lantiga left a comment

Choose a reason for hiding this comment

t-vi commented Aug 20, 2024

nikitaved Aug 16, 2024 •

edited

Loading

t-vi Aug 16, 2024 •

edited

Loading

nikitaved Aug 16, 2024 •

edited

Loading

nikitaved Aug 16, 2024 •

edited

Loading