use tagging checkpointing #1616

t-vi · 2025-01-08T10:16:00Z

This enables the checkpointing through tags and switches the litgpt benchmarks to this pattern.

Follows on top of #1615

python thunder/benchmarks/benchmark_litgpt.py --model_name Qwen2.5-7B --compile thunder --checkpoint_activations True --low_precision_mode none --micro_batch_size 1 --n_layer 4 --max_iters 3 --warmup_iters 2 --block_size 4096

Eager instead of thunder: Average iter time: 803.95 ms Memory used: 17.55 GB
Thunder with this PR: Average iter time: 779.80 ms, Memory used: 18.34 GB
Main: Average iter time: 788.88 ms, Memory used: 18.76 GB
refactor recomputation to work with tags #1615: Average iter time: 686.02 ms, Memory used: 20.77 GB

python thunder/benchmarks/benchmark_litgpt.py --model_name stablecode-completion-alpha-3b --compile thunder --checkpoint_activations True --low_precision_mode none --micro_batch_size 1 --n_layer 4 --max_iters 3 --warmup_iters 2

Eager instead of thunder: Average iter time: 1256.91 ms, Memory used: 10.37 GB
Thunder with this PR: Average iter time: 1229.80 ms, Memory used: 10.27 GB
Main: Average iter time: 1279.73 ms, Memory used: 13.27
refactor recomputation to work with tags #1615 Average iter time: 1146.84 ms, Memory used: 13.29 GB

@riccardofelluga

t-vi · 2025-01-08T10:25:04Z

Marking this draft because it's onto the #1615 branch.

To my mind, both would be good to have soonish.

There is more investigation to do for Qwen and rematerialization.

lantiga

Stamped!

t-vi requested review from mruberry and lantiga as code owners January 8, 2025 10:16

t-vi mentioned this pull request Jan 8, 2025

refactor recomputation to work with tags #1615

Merged

riccardofelluga self-requested a review January 8, 2025 10:23

t-vi marked this pull request as draft January 8, 2025 10:24

Base automatically changed from tom/recomputation-refactor to main January 8, 2025 15:35

t-vi marked this pull request as ready for review January 8, 2025 15:35

use tagging checkpointing

e065077

t-vi force-pushed the tom/tagging-checkpointing branch from 1953f60 to e065077 Compare January 8, 2025 15:43

t-vi enabled auto-merge (squash) January 8, 2025 15:43

lantiga approved these changes Jan 8, 2025

View reviewed changes

t-vi merged commit d19e063 into main Jan 8, 2025
41 checks passed

t-vi deleted the tom/tagging-checkpointing branch January 8, 2025 18:46

riccardofelluga pushed a commit that referenced this pull request Jan 27, 2025

use tagging checkpointing (#1616)

3814a53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use tagging checkpointing #1616

use tagging checkpointing #1616

t-vi commented Jan 8, 2025 •

edited

Loading

t-vi commented Jan 8, 2025

lantiga left a comment

use tagging checkpointing #1616

use tagging checkpointing #1616

Conversation

t-vi commented Jan 8, 2025 • edited Loading

t-vi commented Jan 8, 2025

lantiga left a comment

Choose a reason for hiding this comment

t-vi commented Jan 8, 2025 •

edited

Loading