Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use tagging checkpointing #1616

Merged
merged 1 commit into from
Jan 8, 2025
Merged

use tagging checkpointing #1616

merged 1 commit into from
Jan 8, 2025

Conversation

t-vi
Copy link
Collaborator

@t-vi t-vi commented Jan 8, 2025

This enables the checkpointing through tags and switches the litgpt benchmarks to this pattern.

Follows on top of #1615

python thunder/benchmarks/benchmark_litgpt.py --model_name Qwen2.5-7B --compile thunder --checkpoint_activations True --low_precision_mode none --micro_batch_size 1 --n_layer 4 --max_iters 3 --warmup_iters 2 --block_size 4096
  • Eager instead of thunder: Average iter time: 803.95 ms Memory used: 17.55 GB
  • Thunder with this PR: Average iter time: 779.80 ms, Memory used: 18.34 GB
  • Main: Average iter time: 788.88 ms, Memory used: 18.76 GB
  • refactor recomputation to work with tags #1615: Average iter time: 686.02 ms, Memory used: 20.77 GB
python thunder/benchmarks/benchmark_litgpt.py --model_name stablecode-completion-alpha-3b --compile thunder --checkpoint_activations True --low_precision_mode none --micro_batch_size 1 --n_layer 4 --max_iters 3 --warmup_iters 2 
  • Eager instead of thunder: Average iter time: 1256.91 ms, Memory used: 10.37 GB
  • Thunder with this PR: Average iter time: 1229.80 ms, Memory used: 10.27 GB
  • Main: Average iter time: 1279.73 ms, Memory used: 13.27
  • refactor recomputation to work with tags #1615 Average iter time: 1146.84 ms, Memory used: 13.29 GB

@riccardofelluga

@t-vi t-vi requested review from mruberry and lantiga as code owners January 8, 2025 10:16
@riccardofelluga riccardofelluga self-requested a review January 8, 2025 10:23
@t-vi t-vi marked this pull request as draft January 8, 2025 10:24
@t-vi
Copy link
Collaborator Author

t-vi commented Jan 8, 2025

Marking this draft because it's onto the #1615 branch.

To my mind, both would be good to have soonish.

There is more investigation to do for Qwen and rematerialization.

Base automatically changed from tom/recomputation-refactor to main January 8, 2025 15:35
@t-vi t-vi marked this pull request as ready for review January 8, 2025 15:35
@t-vi t-vi force-pushed the tom/tagging-checkpointing branch from 1953f60 to e065077 Compare January 8, 2025 15:43
@t-vi t-vi enabled auto-merge (squash) January 8, 2025 15:43
Copy link
Collaborator

@lantiga lantiga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stamped!

@t-vi t-vi merged commit d19e063 into main Jan 8, 2025
41 checks passed
@t-vi t-vi deleted the tom/tagging-checkpointing branch January 8, 2025 18:46
riccardofelluga pushed a commit that referenced this pull request Jan 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants