Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LitGPT benchmarking: Use native PyTorch checkpointing in the dynamo+thunder path #1370

Merged
merged 1 commit into from
Oct 31, 2024

Conversation

kiya00
Copy link
Collaborator

@kiya00 kiya00 commented Oct 30, 2024

Use the native PyTorch checkpoint option in litgpt benchmark for the Thunder Dynamo path

Ref #1298.

H100*8 ZeRO3 with checkpointing
torchrun --nproc_per_node=8 --nnodes=1 thunder/benchmarks/benchmark_litgpt.py --model_name CodeLlama-34b-hf --micro_batch_size 1 --compile thunder-dynamo --checkpoint_activations=True --distributed_mode=fsdp --shard_mode zero3 --max_iters=4 --warmup_iters=1

  micro batch size peak mem
longchat-13b-16k 3 50.40 GB
CodeLlama-34b-hf 1 48.07 GB
Gemma-2-27b 1 OOM
Llama-3-70B 1 78.77 GB
Mistral-7B-v0.2 3 67.44 GB
vicuna-7b-v1.5-16k 5 54.81 GB

Note:
This PR enable the ThunderFX + native PyTorch checkpointing

Single GPU:
the splitter creates the module as follows:

GraphModule(
  (thunder_1): ThunderModule(
    (_model): GraphModule()
  )
  (inductor_2): OptimizedModule(
    (_orig_mod): GraphModule(
      (wrap_body_0): GraphModule()
    )
  )
  (thunder_3): ThunderModule(
    (_model): GraphModule()
  )
  (inductor_4): OptimizedModule(
    (_orig_mod): GraphModule(
      (wrap_body_1): GraphModule()
    )
  )
  (thunder_5): ThunderModule(
    (_model): GraphModule()
  )
)

The checkpoint operator is not supported by Thunder and fallback to running with inductor(the converter PR #1261 can fix this)

ZeRO3:
Dynamo only passes parts of the original model to the backend (the gm in ThunderCompiler.__call__) that do not contain a checkpoint operator when --bucketing_mode=none is used.

@IvanYashchuk IvanYashchuk changed the title Add native PyTorch checkpoint in litgpt benchmark (#1298) LitGPT benchmarking: Use native PyTorch checkpointing in the dynamo+thunder path Oct 30, 2024
@IvanYashchuk IvanYashchuk added memory use thunderfx for things that could be applicable to the dynamo+thunder frontend labels Oct 30, 2024
@IvanYashchuk IvanYashchuk removed the request for review from crcrpar October 30, 2024 18:17
@IvanYashchuk
Copy link
Collaborator

When --bucketing_mode=block is used then Dynamo starts sending the graphs with a torch.ops.higher_order.tag_activation_checkpoint operator inside for which we need special processing added in #1261.

@IvanYashchuk
Copy link
Collaborator

@t-vi, can you please merge this pull request?

@IvanYashchuk IvanYashchuk enabled auto-merge (squash) October 30, 2024 19:35
Copy link
Collaborator

@t-vi t-vi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@IvanYashchuk IvanYashchuk merged commit 7b52be0 into main Oct 31, 2024
43 checks passed
@IvanYashchuk IvanYashchuk deleted the ckp-benchmark branch October 31, 2024 08:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
memory use thunderfx for things that could be applicable to the dynamo+thunder frontend
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants