[benchmark] add option to use torchao's float8 of dynamic scaling with fsdp2 #997

crcrpar · 2024-08-19T13:29:35Z

This adds an option to use float8 of torchao.
An example command to use float8 with FSDP2:

torchrun --nproc-per-node 8 --local-ranks-filter 0 --role rank --tee 3 thunder/benchmarks/benchmark_litgpt.py --model_name Llama-2-7b-hf --compile inductor --distributed_mode fsdp2 --shard_mode zero2 --use_torchao_fp8_linear true --use_torchao_fp8_allgather true --use_torchao_fp8_precompute_scale_for_fsdp true

note that --use_torchao_fp8_precompute_scale_for_fsdp true keeps models in fp32.

Llama-2-7b-hf on 8 H100, using pjnl-20240821

as of ff66203, torchao of pytorch/ao@a0376bf

if compiler is torch, fsdp2 is used.

compiler	executors	bs	Tokens/s/GPU	Memory Used
torch	fp8: linear, all-gather, & precompute	1	14640.35	34.26
torch	fp8: linear, all-gather, & precompute	2	17993.14	49.69
torch	fp8: linear & all-gather	1	14587.93	34.26
torch	fp8: linear & all-gather	2	17912.73	49.70
torch	fp8: linear	1	13897.13	40.64
torch	fp8: linear	2	17219.84	56.12
torch	thunder_inductor_cat_cudnn_dynamo	1	12506.03	40.21
torch	thunder_inductor_cat_cudnn_dynamo	2	13890.66	61.82
thunder	inductor_cat_cudnn_transformerengine	1	14274.20	52.88
thunder	inductor_cat_cudnn_transformerengine	2	16329.36	74.01

thunder/benchmarks/benchmark_litgpt.py

Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

…aling Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

lantiga

Stamped!

crcrpar requested a review from kshitij12345 August 19, 2024 13:35

drisspg reviewed Aug 19, 2024

View reviewed changes

thunder/benchmarks/benchmark_litgpt.py Outdated Show resolved Hide resolved

drisspg reviewed Aug 19, 2024

View reviewed changes

thunder/benchmarks/benchmark_litgpt.py Outdated Show resolved Hide resolved

crcrpar force-pushed the crpa/fsdp2-torchao_float8 branch 2 times, most recently from a88b136 to df54b94 Compare August 21, 2024 05:36

crcrpar marked this pull request as ready for review August 21, 2024 05:36

crcrpar requested review from mruberry, lantiga and t-vi as code owners August 21, 2024 05:36

crcrpar force-pushed the crpa/fsdp2-torchao_float8 branch 4 times, most recently from e01a625 to b487ddf Compare August 22, 2024 18:14

crcrpar added 9 commits August 23, 2024 10:15

dynamic scaling

0f3dfe3

Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

keep params in fp32, when precompute is enabled

9e4bdd9

Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

numbers look intact even if original model params are in fp32

78248b0

Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

fix cond

6bf091a

Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

improve condition

b6bab15

Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

add torchao usage in the last prints

e1323f9

Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

Update benchmark_litgpt.py

20da2ea

no need to keep params in fp32 even with precompute of fp8 dynamic sc…

bb0f57a

…aling Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

update cond

3a747cc

Signed-off-by: Masaki Kozuki <mkozuki@nvidia.com>

crcrpar force-pushed the crpa/fsdp2-torchao_float8 branch from b487ddf to 3a747cc Compare August 23, 2024 01:15

lantiga approved these changes Aug 23, 2024

View reviewed changes

IvanYashchuk merged commit bdc1e8a into main Aug 23, 2024
37 checks passed

IvanYashchuk deleted the crpa/fsdp2-torchao_float8 branch August 23, 2024 07:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[benchmark] add option to use torchao's float8 of dynamic scaling with fsdp2 #997

[benchmark] add option to use torchao's float8 of dynamic scaling with fsdp2 #997

crcrpar commented Aug 19, 2024 •

edited

Loading

lantiga left a comment

[benchmark] add option to use torchao's float8 of dynamic scaling with fsdp2 #997

[benchmark] add option to use torchao's float8 of dynamic scaling with fsdp2 #997

Conversation

crcrpar commented Aug 19, 2024 • edited Loading

Llama-2-7b-hf on 8 H100, using pjnl-20240821

lantiga left a comment

Choose a reason for hiding this comment

crcrpar commented Aug 19, 2024 •

edited

Loading