-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update checkpointing support for jit #1560
base: main
Are you sure you want to change the base?
Conversation
There are a number of things still to be fixed. |
eb43de8
to
af68c02
Compare
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
To my mind, the remaining missing bit is the handling of |
e103484
to
891bf84
Compare
I was trying to check memory savings but it looks like the following hangs: python thunder/benchmarks/benchmark_litgpt.py --model_name stablecode-completion-alpha-3b --compile thunder --checkpoint_activations True --low_precision_mode none --micro_batch_size 1 :( |
So one needs to enable checkpointing layers with compiler==thunder for this. Then the memory profile of the backward is still terrible: I think we need to look more closely at the memory over time. |
Doing the following:
We see that we are doing much better at first than after transform for execution (this is for 4 layers), so we still have reordering that hurts us.
Note that the difference of 2.55GB is smaller than the difference between thunder and eager (5.82GB). |
ebf420c
to
b1604ee
Compare
…sing of the backward compiled function
With the two latest bits
I have that
is on par (even a little below) the same with |
I'll add switches and testing of checkpointing, but here is the material code changes.