Complete `torch.compile` executor #140

carmocca · 2024-04-05T19:32:59Z

What does this PR do?

Adds an instance of TorchCompileExecutor named torch_compile_Ex that registers all of pytorch_executor's and sdpa_ex's operators.

The rename of the existing executor torch_compile_executor to torch_compile_cat_ex breaks backwards compatibility.

Playground script with litgpt:

from litgpt import GPT
import thunder
import torch

from thunder import pytorch_executor
from thunder.executors.torch_compile import torch_compile_ex

with torch.device("cuda"):
    model = GPT.from_name("Llama-2-7b-hf", n_layer=1)

model = thunder.jit(model, executors=[torch_compile_ex])
x = torch.randint(model.max_seq_length, (2, 5), device="cuda")
y = model(x)

forward_trace = thunder.last_traces(model)[-1].python()
print(forward_trace)
assert "TorchCompile" in str(forward_trace)

Output:

@torch.no_grad()
@no_autocast
def augmented_forward_fn(idx, t_lm_head_weight, t_transformer_h_0_attn_attn_weight, t_transformer_h_0_attn_proj_weight, t_transformer_h_0_mlp_fc_1_weight, t_transformer_h_0_mlp_fc_2_weight, t_transformer_h_0_mlp_proj_weight, t_transformer_h_0_norm_1_weight, t_transformer_h_0_norm_2_weight, t_transformer_ln_f_weight, t_transformer_wte_weight, tos1, t_sin):
  # idx: "cuda:0 i64[2, 5]"
  # t_lm_head_weight: "cuda:0 f32[32000, 4096]"
  # t_transformer_h_0_attn_attn_weight: "cuda:0 f32[12288, 4096]"
  # t_transformer_h_0_attn_proj_weight: "cuda:0 f32[4096, 4096]"
  # t_transformer_h_0_mlp_fc_1_weight: "cuda:0 f32[11008, 4096]"
  # t_transformer_h_0_mlp_fc_2_weight: "cuda:0 f32[11008, 4096]"
  # t_transformer_h_0_mlp_proj_weight: "cuda:0 f32[4096, 11008]"
  # t_transformer_h_0_norm_1_weight: "cuda:0 f32[4096]"
  # t_transformer_h_0_norm_2_weight: "cuda:0 f32[4096]"
  # t_transformer_ln_f_weight: "cuda:0 f32[4096]"
  # t_transformer_wte_weight: "cuda:0 f32[32000, 4096]"
  # tos1: "cuda:0 f32[4096, 128]"
  # t_sin: "cuda:0 f32[4096, 128]"
  [t10, t101, t106, t70, t81] = TorchCompile0(idx, t_lm_head_weight, t_sin, t_transformer_h_0_attn_attn_weight, t_transformer_h_0_attn_proj_weight, t_transformer_h_0_mlp_fc_1_weight, t_transformer_h_0_mlp_fc_2_weight, t_transformer_h_0_mlp_proj_weight, t_transformer_h_0_norm_1_weight, t_transformer_h_0_norm_2_weight, t_transformer_ln_f_weight, t_transformer_wte_weight, tos1)
  return {'output': t106, 'flat_args': [idx, t_lm_head_weight, t_transformer_h_0_attn_attn_weight, t_transformer_h_0_attn_proj_weight, t_transformer_h_0_mlp_fc_1_weight, t_transformer_h_0_mlp_fc_2_weight, t_transformer_h_0_mlp_proj_weight, t_transformer_h_0_norm_1_weight, t_transformer_h_0_norm_2_weight, t_transformer_ln_f_weight, t_transformer_wte_weight, tos1, t_sin], 'flat_output': (t106,)}, ((idx, t10, t101, t70, t81, t_lm_head_weight, t_sin, t_transformer_h_0_attn_attn_weight, t_transformer_h_0_attn_proj_weight, t_transformer_h_0_mlp_fc_1_weight, t_transformer_h_0_mlp_fc_2_weight, t_transformer_h_0_mlp_proj_weight, t_transformer_h_0_norm_1_weight, t_transformer_h_0_norm_2_weight, t_transformer_ln_f_weight, t_transformer_wte_weight, tos1), (False, False, 0.29730177875068026, 0.29730177875068026, 4096.0, 4096.0, 4096.0, 32000, 2, -1))

Fixes https://github.com/Lightning-AI/lit-thunder-LEGACY/issues/2141

cc @Borda @apaz-cli

for more information, see https://pre-commit.ci

IvanYashchuk · 2024-04-08T12:58:30Z

Rename the existing torch_compile executor to torch_compile_partial

I suggest renaming the existing executor to "concat_inductor". This is what it does, uses Inductor to fuse concatenation and surrounding operations.

for more information, see https://pre-commit.ci

…ete-torch-compile

review-notebook-app · 2024-04-23T11:50:21Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

for more information, see https://pre-commit.ci

thunder/tests/framework.py

carmocca · 2024-04-30T00:28:10Z

Are these known flakes on Windows?

FAILED thunder/tests/test_grad.py::test_vjp_correctness_getitem_torch_cpu_float64 - AssertionError: Scalars are not close!

Expected 75.60970520045251 but got 81.13586235571096.
Absolute difference: 5.526157155258446 (up to 1e-05 allowed)
Relative difference: 0.07308793415617461 (up to 1.3e-06 allowed)
FAILED thunder/tests/test_grad.py::test_phantom_grad_vs_torch_consistency_getitem_torch_cpu_bfloat16 - AssertionError: Tensor-likes are not close!

Mismatched elements: 112 / 140 (80.0%)
Greatest absolute difference: 4.0 at index (0, 0, 0) (up to 1e-05 allowed)
Greatest relative difference: 2.0 at index (0, 0, 0) (up to 0.016 allowed)
FAILED thunder/tests/test_grad.py::test_phantom_grad_vs_torch_consistency_getitem_torch_cpu_float64 - AssertionError: Tensor-likes are not close!

Mismatched elements: 112 / 140 (80.0%)
Greatest absolute difference: 7.0 at index (0, 0, 0) (up to 1e-07 allowed)
Greatest relative difference: 3.5 at index (0, 0, 0) (up to 1e-07 allowed)
FAILED thunder/tests/test_grad.py::test_phantom_grad_vs_torch_consistency_getitem_torch_cpu_float32 - AssertionError: Tensor-likes are not close!

Mismatched elements: 112 / 140 (80.0%)
Greatest absolute difference: 5.0 at index (0, 0, 0) (up to 1e-05 allowed)
Greatest relative difference: 2.5 at index (0, 0, 0) (up to 1.3e-06 allowed)

apaz-cli · 2024-04-30T05:06:50Z

@carmocca I usually assume that consistency tests are flakes, yeah. Re-run, and it should go away. If it doesn't, then it wasn't a flake :)

t-vi

LGTM Awesome work. Thank you @carmocca @IvanYashchuk @apaz-cli

IvanYashchuk · 2024-05-10T13:14:24Z

thunder/benchmarks/benchmark_litgpt.py

+            if "inductor_cat" in self.compile:
+                from thunder.executors.torch_compile import torch_compile_cat_ex as torch_compile_ex
+
+                executors.insert(0, torch_compile_ex)
+            elif "inductor" in self.compile:
+                from thunder.executors.torch_compile import torch_compile_ex


I wanted to highlight that our nightly scripts and reporting depend on the current naming convention to monitor performance history. There was no real need to modify these benchmark options names in this PR in a breaking way. In the future, I suggest we explore different alternatives for modifications that alter existing behavior before finalizing the merge. Additionally, after merging, it's important to communicate these changes through various channels, not limited to GitHub, to ensure everyone is informed.

Comlete torch.compile executor

0fa19e9

carmocca self-assigned this Apr 5, 2024

[pre-commit.ci] auto fixes from pre-commit.com hooks

07b7128

for more information, see https://pre-commit.ci

carmocca added enhancement New feature or request executors torch.compile labels Apr 5, 2024

Update test

916ee2e

This comment was marked as outdated.

Sign in to view

carmocca mentioned this pull request Apr 8, 2024

[thunder] Update torch.compile executor usage Lightning-AI/litgpt#1264

Merged

Always register implmap

75bf7ea

carmocca force-pushed the carmocca/complete-torch-compile branch from f2e355b to 75bf7ea Compare April 10, 2024 12:09

carmocca and others added 7 commits April 10, 2024 08:18

fullgraph=True

b59a2fa

Enable fullgraph with torch.compile (v2)

b804e3a

[pre-commit.ci] auto fixes from pre-commit.com hooks

3ebc3a2

for more information, see https://pre-commit.ci

Still need this actually

d7a107b

autocast outside of compile

5f90384

t-vi to the rescue

d561456

Missed comment

6ca4ee0

github-actions bot added the has conflicts label Apr 13, 2024

apaz-cli mentioned this pull request Apr 15, 2024

Add a lookaside for torch.compile and investigate interpreter failure #186

Closed

carmocca added 6 commits April 22, 2024 19:09

Merge branch 'main' into carmocca/torch_compile_fullgraph_2

512d700

Merge branch 'main' into carmocca/torch_compile_fullgraph_2

86df4c0

Keep context managers outside of torch.compile

8256209

Notebook update

b136059

Typo

f76bba6

Merge branch 'carmocca/torch_compile_fullgraph_2' into carmocca/compl…

1c00389

…ete-torch-compile

github-actions bot added documentation Improvements or additions to documentation and removed has conflicts labels Apr 23, 2024

carmocca added 5 commits April 26, 2024 00:14

Nit

c6101e9

Merge branch 'main' into carmocca/complete-torch-compile

1698a91

Fixes

7de7c20

Drop SDPA

5b0550b

Funny windows

facfe45

carmocca marked this pull request as ready for review April 29, 2024 17:05

carmocca requested review from mruberry, lantiga, robieta and t-vi as code owners April 29, 2024 17:05

carmocca requested review from IvanYashchuk and apaz-cli April 29, 2024 17:06

No syntaxerror

ba75ad8

carmocca force-pushed the carmocca/complete-torch-compile branch from 58077e9 to ba75ad8 Compare April 29, 2024 17:45

carmocca and others added 3 commits April 29, 2024 20:18

is_inductor_supported

d589bc4

is_inductor_supported

1c90ff3

[pre-commit.ci] auto fixes from pre-commit.com hooks

8b4ccac

for more information, see https://pre-commit.ci

carmocca commented Apr 29, 2024

View reviewed changes

thunder/tests/framework.py Outdated Show resolved Hide resolved

carmocca mentioned this pull request Apr 29, 2024

Support FSDP and torch.compile #298

Open

Undo

3cd157b

carmocca mentioned this pull request Apr 29, 2024

Add the torch.compile executor as a test executor #299

Open

issue number

a359cfc

Update benchmark_litgpt.py

01ac6a6

t-vi approved these changes May 3, 2024

View reviewed changes

t-vi merged commit 7ac5684 into main May 3, 2024
36 of 39 checks passed

t-vi deleted the carmocca/complete-torch-compile branch May 3, 2024 12:31

IvanYashchuk reviewed May 10, 2024

View reviewed changes

IvanYashchuk mentioned this pull request May 10, 2024

Thunder + Inductor gives OOM for stablecode-completion-alpha-3b model from LitGPT #246

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Complete `torch.compile` executor #140

Complete `torch.compile` executor #140

Uh oh!

carmocca commented Apr 5, 2024 •

edited

Loading

Uh oh!

This comment was marked as outdated.

IvanYashchuk commented Apr 8, 2024

Uh oh!

review-notebook-app bot commented Apr 23, 2024

Uh oh!

Uh oh!

carmocca commented Apr 30, 2024

Uh oh!

apaz-cli commented Apr 30, 2024

Uh oh!

t-vi left a comment

Uh oh!

Uh oh!

IvanYashchuk May 10, 2024

Uh oh!

Uh oh!

Complete torch.compile executor #140

Complete torch.compile executor #140

Uh oh!

Conversation

carmocca commented Apr 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

This comment was marked as outdated.

IvanYashchuk commented Apr 8, 2024

Uh oh!

review-notebook-app bot commented Apr 23, 2024

Uh oh!

Uh oh!

carmocca commented Apr 30, 2024

Uh oh!

apaz-cli commented Apr 30, 2024

Uh oh!

t-vi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

IvanYashchuk May 10, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Complete `torch.compile` executor #140

Complete `torch.compile` executor #140

carmocca commented Apr 5, 2024 •

edited

Loading