TransformerEngine - Intermediate tensor sharding #695

kshitij12345 · 2024-07-02T10:39:28Z

TransformerEngine added the ability to shard intermediate activation tensors in v1.8. Currently, we save global/world sized activation for backward pass. Using this, we can lower the peak memory usage at the cost of added comms - as we will shard these intermediate tensor and gather them before the backward computation.

TE PR: NVIDIA/TransformerEngine#687

In this PR, we use make this option opt-in using thunder.jit compile argument - fp8_shard_intermediate_activation.

Example usage: model = thunder.jit(model, executors=executors, fp8_shard_intermediate_activation=True)

Testing

Updated the distributed test to use this option. Have tested with existing tests in test_transformer_engine_executor.py and test_ddp.py -k transformer with TE v1.7 (current stable), v1.8 and v1.9 (current main).

Benchmark
Command -

torchrun --nproc_per_node=8 --nnodes=1 thunder/benchmarks/benchmark_litgpt.py --return_metrics_as_json=True --json_path=/tmp/benchmark_litgpt_data.json --distributed_mode=fsdp --shard_mode=zero3 --model_name=Llama-2-7b-hf --micro_batch_size=1 --compile=thunder_inductor_cat_transformerengine_cudnn --nsys_enabled=False --dynamic=False

Without FP8 Intermediate Sharding

Average iter time: 282.47 ms
Memory used: 52.92 GB

With FP8 Intermediate Sharding

Average iter time: 341.67 ms
Memory used: 44.05 GB

Patch to enable sharding in `benchmark_litgpt.py`

diff --git a/thunder/benchmarks/benchmark_litgpt.py b/thunder/benchmarks/benchmark_litgpt.py
index bad6ef74..947295fc 100644
--- a/thunder/benchmarks/benchmark_litgpt.py
+++ b/thunder/benchmarks/benchmark_litgpt.py
@@ -341,7 +341,7 @@ class Benchmark_litGPT:
 
                 executors.insert(0, transformer_engine_ex)
 
-            model = thunder.jit(model, executors=executors)
+            model = thunder.jit(model, executors=executors, fp8_shard_intermediate_activation=True)
 
         elif self.compile != "eager":
             raise ValueError(f"Invalid compile option: {self.compile}")

… te-intermediate-sharding

…ghtning-thunder into te-intermediate-sharding

… te-intermediate-sharding

t-vi

Thank you @kshitij12345 @IvanYashchuk

kshitij12345 added 5 commits June 19, 2024 14:51

TE - intermediate activation sharding

77159b0

Merge branch 'main' of github.com:Lightning-AI/lightning-thunder into…

19fb18f

… te-intermediate-sharding

update: add thunder.jit option and update test

80abfa3

update: move intermediate-shard enable logic to function and update test

1db8903

Merge branch 'Lightning-AI:main' into te-intermediate-sharding

e098c22

kshitij12345 added the TransformerEngine label Jul 2, 2024

kshitij12345 added 5 commits July 4, 2024 00:33

make the feature opt-in

c76d5b0

Merge branch 'te-intermediate-sharding' of github.com:kshitij12345/li…

57609f7

…ghtning-thunder into te-intermediate-sharding

move import

a12cbf7

skip intermediate_sharding test for v1.7

a2353ec

Merge branch 'main' of github.com:Lightning-AI/lightning-thunder into…

b062aff

… te-intermediate-sharding

kshitij12345 marked this pull request as ready for review July 4, 2024 14:06

kshitij12345 requested review from mruberry, lantiga and t-vi as code owners July 4, 2024 14:06

kshitij12345 changed the title ~~[WIP] TransformerEngine - Intermediate Sharding~~ TransformerEngine - Intermediate Sharding Jul 4, 2024

kshitij12345 requested a review from IvanYashchuk July 4, 2024 14:06

kshitij12345 changed the title ~~TransformerEngine - Intermediate Sharding~~ TransformerEngine - Intermediate tensor sharding Jul 4, 2024

IvanYashchuk approved these changes Jul 5, 2024

View reviewed changes

IvanYashchuk added the distributed label Jul 5, 2024

t-vi approved these changes Jul 5, 2024

View reviewed changes

t-vi merged commit 0ac3b6d into Lightning-AI:main Jul 5, 2024
40 checks passed

crcrpar mentioned this pull request Sep 30, 2024

Add option to enable fp8_shard_intermediate_activation #1215

Merged

github-actions bot deleted the te-intermediate-sharding branch October 4, 2024 00:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TransformerEngine - Intermediate tensor sharding #695

TransformerEngine - Intermediate tensor sharding #695

kshitij12345 commented Jul 2, 2024 •

edited

Loading

t-vi left a comment

TransformerEngine - Intermediate tensor sharding #695

TransformerEngine - Intermediate tensor sharding #695

Conversation

kshitij12345 commented Jul 2, 2024 • edited Loading

t-vi left a comment

Choose a reason for hiding this comment

kshitij12345 commented Jul 2, 2024 •

edited

Loading