CUDAGraphs as executor/transform/fusion pass #656

nikitaved · 2024-06-26T10:39:41Z

As per title. Fixes #635.

Also, it fixes the following subtle bugs:

CUDAGraphExecutor - does not properly update static buffers when the same graph is invoked on inputs with meta-data that allows to fetch cached graphs, but with different storage data. The area of concern - training and the backward pass.
horizontal_merge in the fusion logic - that one, when grouping bound symbols, does not consider precedence between ops horizontally. It is not an issue with nvfusions, but it could cause issues when deciding whether to place something like del x after op(x) in a custom FusionExecutor. The fix sorts bsyms in each group wrt trace position (which is expected to be toposorted prior to any fusions) and, hence, restores the inter-/intra-bsym groups topological order.

for more information, see https://pre-commit.ci

t-vi

Looks great overall, thank you @nikitaved .
I have one question in the comments and then I'd be all for merging it.

t-vi · 2024-06-26T18:49:30Z

thunder/executors/cudagraphex.py

+        if bsym.sym.id in do_not_fuse_sym_set:
+            return False
+
+        return True


Would we need to check for fixed sizes of the proxies in input and output or ist this handled?

I wonder whether the future API is agreed upon? Does it mean that dynamic shape Tensors will contain Integer proxies in their meta-data?

IvanYashchuk · 2024-06-27T09:10:24Z

Nice! This change is needed to make my PR #214 work with CUDA Graphs correctly. Because there I try to put torch.autograd.Function.apply into the forward trace but it should be executed outside of the CUDA Graph-captured region.

lantiga

Looks great @nikitaved!
I'm imagining that in the future we could consider not special-casing use_cudagraphs but keeping it as a transform, but maybe that's overkill and it's certainly it's not for now.

t-vi · 2024-06-27T10:19:54Z

Let's merge and fix anything that needs fixing later.

nikitaved requested review from mruberry, lantiga and t-vi as code owners June 26, 2024 10:39

nikitaved force-pushed the nikitaved/cudagraph_ex branch from 4151315 to 1ee2fb5 Compare June 26, 2024 10:46

nikitaved requested a review from IvanYashchuk June 26, 2024 10:48

CUDAGraphs as executor/transform/fusion pass

6a1ce53

nikitaved force-pushed the nikitaved/cudagraph_ex branch 2 times, most recently from e4c0efb to bd68121 Compare June 26, 2024 16:07

Move CollectionProxy tracking into TraceCtx

791dab8

nikitaved force-pushed the nikitaved/cudagraph_ex branch from 0defe05 to 791dab8 Compare June 26, 2024 16:11

[pre-commit.ci] auto fixes from pre-commit.com hooks

fb4f9de

for more information, see https://pre-commit.ci

t-vi approved these changes Jun 26, 2024

View reviewed changes

lantiga approved these changes Jun 27, 2024

View reviewed changes

t-vi merged commit 390d8e3 into main Jun 27, 2024
27 of 31 checks passed

t-vi deleted the nikitaved/cudagraph_ex branch June 27, 2024 10:18

This was referenced Jun 27, 2024

replace clear_mutable_collection with a data structure #607

Closed

retire the previous impl of CUDAGraphExecutor #668

Merged

nikitaved mentioned this pull request Jul 17, 2024

remove OverridenKVCache and fix some peculiar cases of prims.copy_ + NVFuser #788

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CUDAGraphs as executor/transform/fusion pass #656

CUDAGraphs as executor/transform/fusion pass #656

Uh oh!

nikitaved commented Jun 26, 2024 •

edited

Loading

Uh oh!

t-vi left a comment

Uh oh!

t-vi Jun 26, 2024

Uh oh!

nikitaved Jun 27, 2024

Uh oh!

IvanYashchuk commented Jun 27, 2024

Uh oh!

lantiga left a comment

Uh oh!

Uh oh!

t-vi commented Jun 27, 2024

Uh oh!

Uh oh!

CUDAGraphs as executor/transform/fusion pass #656

CUDAGraphs as executor/transform/fusion pass #656

Uh oh!

Conversation

nikitaved commented Jun 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

t-vi left a comment

Choose a reason for hiding this comment

Uh oh!

t-vi Jun 26, 2024

Choose a reason for hiding this comment

Uh oh!

nikitaved Jun 27, 2024

Choose a reason for hiding this comment

Uh oh!

IvanYashchuk commented Jun 27, 2024

Uh oh!

lantiga left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

t-vi commented Jun 27, 2024

Uh oh!

Uh oh!

nikitaved commented Jun 26, 2024 •

edited

Loading