-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a benchmark for portions of LitGPT model other than SDPA #148
Conversation
for more information, see https://pre-commit.ci
|
||
BATCH_SIZE = 2 | ||
CONFIG_NAMES = list(sorted(c["name"] for c in configs)) | ||
# CONFIG_NAMES = ["Llama-2-7b-hf",] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uncommenting this would force generating benchmark cases just for this Llama 2 7B config.
thunder/benchmarks/__init__.py
Outdated
inductor_cutlass_executor = partial(inductor_gemm_executor, gemm_backend="ATEN,CUTLASS") | ||
inductor_triton_executor = partial(inductor_gemm_executor, gemm_backend="ATEN,TRITON") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Maybe "ATEN" should be removed here forcing Inductor to use cutlass or triton for gemms. I should try if it works without any errors.
# litgpt_traces = [ | ||
# TraceInfo(name, i, trace) for name in CONFIG_NAMES for i, trace in enumerate(make_torch_traces_for_config(name)) | ||
# ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
List comprehensions are easier to read for me than the for-loop below.
I'll remove this of course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! I've played a bit with the make_torch_traces_for_config
and it does the job as long as the part that we are interested in comes after sdpa.
Try with ``` pytest thunder/benchmarks/litgpt_chunks.py --benchmark-group-by='group,param:info' --benchmark-columns='min,max,mean,stddev,median' ```
Hey @IvanYashchuk should we revive this or close for now? We can add a label for PRs we close that could be potentially of interest for the future. |
I've put it to draft to prevent merging because I need more time to think about it and convince myself again that it's something we need in the project. |
"draft" is the new browser tab haha |
This PR adds a new benchmark, run it with
The intent is to be able to compare performance on sections of the GPT network that are not covered by a FlashAttention kernel.
Constructing benchmark cases is slow and
-s
gives a progress bar:and it takes 6 minutes to generate the test cases:
It's possible to control the batch size by modifying
BATCH_SIZE
inlitgpt_chunks.py
and which configs to benchmark can be controlled by modifying theCONFIG_NAMES
list.Thunder is used for tracing the litgpt code and then the trace is split into chunks with the SDPA call as a delimiter. Since the GPT model has a for-loop structure it's enough to trace a model with just two transformer blocks. It gives the following program chunks:
ln_f
+lm_head
(https://github.com/Lightning-AI/litgpt/blob/78bd4cae1e655359e92e7c7b830fcbaa4c15c152/litgpt/model.py#L95-L96)We could save the result of generating test cases to a disk, but that's left as an exercise for the future.
TODO:
cc @crcrpar @kevinstephano