Rename FusionExector to KernelExecutor, fe to ke, fec to executor_cache #3349

naoyam · 2024-11-05T20:35:57Z

This is just mechanical name change only. Intended to simplify #3263.

naoyam · 2024-11-05T20:36:20Z

!test

naoyam · 2024-11-05T21:26:23Z

!test

naoyam · 2024-11-05T21:48:37Z

!test

naoyam · 2024-11-05T21:50:43Z

Changed lines after this PR:

+2,614 −2,216

https://github.com/NVIDIA/Fuser/pull/3351/files

naoyam · 2024-11-05T22:57:22Z

!test

naoyam · 2024-11-07T00:15:36Z

!build

Follow-up to #3349 `KernelExecutor::compileFusion` -> `KernelExecutor::compile` `KernelExecutor::runFusion` -> `KernelExecutor::run`

The change was accidentally reverted in #3349.

#3349 removed grad accumulation, but rope benchmark implementation needs an update to get that working. Reference implementation. ``` Model Batch-Size Sequence-Length ... Forward-Time(ms) Backward-Kernels Backward-Time(ms) 0 Llama-2-7b-hf 2 4096 ... 0.166 5 0.857 0 Llama-3-8B 2 8192 ... 0.567 5 1.433 0 mistralai/Mistral-Nemo-Base-2407 1 4096 ... 0.138 6 0.166 0 Qwen/Qwen2.5-7B-Instruct 1 4096 ... 0.072 8 0.397 0 microsoft/Phi-3.5-mini-instruct 1 8192 ... 0.236 6 0.494 ``` after l2_cache clear ``` Model Batch-Size Sequence-Length ... Forward-Time(ms) Backward-Kernels Backward-Time(ms) 0 Llama-2-7b-hf 2 4096 ... 0.166 5 0.870 0 Llama-3-8B 2 8192 ... 0.567 5 1.444 0 mistralai/Mistral-Nemo-Base-2407 1 4096 ... 0.138 6 0.192 0 Qwen/Qwen2.5-7B-Instruct 1 4096 ... 0.072 8 0.417 0 microsoft/Phi-3.5-mini-instruct 1 8192 ... 0.234 6 0.516 ``` Before this PR: ``` Name (time in us) Mean Median --------------------------------------------------------------------------------------------------------------------------------- test_rope_bwd_benchmark[executor='thunder'-variation='llama_2_7b_hf_rope'] 1,192.8558 (14.56) 1,191.9040 (14.53) test_rope_bwd_benchmark[executor='thunder'-variation='llama_3_8B_rope'] 1,767.5348 (21.58) 1,766.8410 (21.54) test_rope_bwd_benchmark[executor='thunder'-variation='hf_mistral_nemo_rope'] 275.4680 (3.36) 275.7265 (3.36) test_rope_bwd_benchmark[executor='thunder'-variation='hf_qwen2_rope'] 488.4243 (5.96) 488.3105 (5.95) test_rope_bwd_benchmark[executor='thunder'-variation='hf_phi3_rope'] 757.9140 (9.25) 757.6910 (9.24) --------------------------------------------------------------------------------------------------------------------------------- ``` In this PR: ``` Name (time in us) Mean Median ----------------------------------------------------------------------------------------------------------------------- test_rope_bwd_benchmark[executor='thunder'-variation='llama_2_7b_hf_rope'] 871.5996 (5.23) 871.6050 (5.24) test_rope_bwd_benchmark[executor='thunder'-variation='llama_3_8B_rope'] 1,443.0095 (8.66) 1,442.9955 (8.67) test_rope_bwd_benchmark[executor='thunder'-variation='hf_mistral_nemo_rope'] 166.5515 (1.0) 166.4480 (1.0) test_rope_bwd_benchmark[executor='thunder'-variation='hf_qwen2_rope'] 386.4463 (2.32) 386.5565 (2.32) test_rope_bwd_benchmark[executor='thunder'-variation='hf_phi3_rope'] 452.3351 (2.72) 452.0685 (2.72) ----------------------------------------------------------------------------------------------------------------------- ``` With the existing issue on pytest/torch.profiler, if I instead run each benchmark separately, ``` test_rope_bwd_benchmark[executor='thunder'-variation='llama_2_7b_hf_rope'] 871.1912 871.2465 test_rope_bwd_benchmark[executor='thunder'-variation='llama_3_8B_rope'] 1.4427 1.4427 test_rope_bwd_benchmark[executor='thunder'-variation='hf_mistral_nemo_rope'] 191.6567 191.6795 test_rope_bwd_benchmark[executor='thunder'-variation='hf_qwen2_rope'] 416.8007 416.8935 test_rope_bwd_benchmark[executor='thunder'-variation='hf_phi3_rope'] 514.7512 514.4900 ``` So these number does match the manual benchmark with l2_cache cleared. I think that justifies this PR.

naoyam added 3 commits November 5, 2024 11:51

Rename FusionExecutor to KernelExecutor

b115851

rename fe to ke

58207c5

Rename fec to executor_cache

6019626

naoyam requested a review from csarofeen November 5, 2024 20:36

wujingyue approved these changes Nov 5, 2024

View reviewed changes

forgot to rename

f064e07

naoyam added 2 commits November 5, 2024 14:32

rename fusion_executor_cache to executor_cache

629b774

rename executor to ke

14993fa

Merge branch 'main' into rename

ba2a03b

naoyam merged commit ba4f7d4 into main Nov 7, 2024
15 checks passed

naoyam deleted the rename branch November 7, 2024 01:19

This was referenced Nov 7, 2024

Create dispatch system for executors #3263

Merged

Renaming from #3263 part2 #3362

Merged

naoyam added a commit that referenced this pull request Nov 7, 2024

Renaming from #3263 part2 (#3362)

951dde6

Follow-up to #3349 `KernelExecutor::compileFusion` -> `KernelExecutor::compile` `KernelExecutor::runFusion` -> `KernelExecutor::run`

wujingyue added a commit that referenced this pull request Nov 7, 2024

Redo #3326.

77a8db7

The change was accidentally reverted in #3349.

wujingyue mentioned this pull request Nov 7, 2024

Redo #3326. #3370

Merged

wujingyue added a commit that referenced this pull request Nov 7, 2024

Redo #3326. (#3370)

e072c92

The change was accidentally reverted in #3349.

jjsjann123 mentioned this pull request Jan 22, 2025

fixing rope pytest benchmark grad accumulation #3743

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rename FusionExector to KernelExecutor, fe to ke, fec to executor_cache #3349

Rename FusionExector to KernelExecutor, fe to ke, fec to executor_cache #3349

naoyam commented Nov 5, 2024

naoyam commented Nov 5, 2024

naoyam commented Nov 5, 2024

naoyam commented Nov 5, 2024

naoyam commented Nov 5, 2024

naoyam commented Nov 5, 2024

naoyam commented Nov 7, 2024

Rename FusionExector to KernelExecutor, fe to ke, fec to executor_cache #3349

Rename FusionExector to KernelExecutor, fe to ke, fec to executor_cache #3349

Conversation

naoyam commented Nov 5, 2024

naoyam commented Nov 5, 2024

naoyam commented Nov 5, 2024

naoyam commented Nov 5, 2024

naoyam commented Nov 5, 2024

naoyam commented Nov 5, 2024

naoyam commented Nov 7, 2024