-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Microbenchmark Profiling Memory Issues #597
Conversation
8253dbb
to
3f28ee3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, but I don't think we need the inference_utils.py file.
|
||
|
||
def main(config): | ||
engine = maxengine.MaxEngine(config) | ||
params = engine.load_params() | ||
prefill_lengths = [64, 128, 256, 512, 1024] | ||
benchmark_loop_iters = 10 | ||
prefill_lengths = [int(l) for l in config.inference_microbenchmark_prefill_lengths.split(",")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this work if you pass in a command line param like --inference-microbenchmark-prefill-lengths="512,1024"
or something similar?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, example commands:
to run a single prefill length:
inference_microbenchmark_prefill_lengths=1024
to run a single stage:
inference_microbenchmark_stages=generate
MaxText/inference_utils.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to make a file solely for inference_utils at this time. Especially since these functions are not unique to inference. I would add them to max_utils.py
since they cover fairly generic usecases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inference_utils.py is an existing file, but I can move these functions to max_utils.py
I am writing a batch inference, which need some common utility functions |
benchmark_results["AutoRegressive"], decode_state = ar_benchmark( | ||
config, engine, params, decode_state, iters=benchmark_loop_iters, cache_size=cache_size, model_size=model_size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For running just generate benchmark, you still need to populate kv cache to produce proper perf numbers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You will still need to initialize a decode_state for generate step calculation
@@ -261,3 +261,8 @@ vertex_tensorboard_project: "" | |||
# Region to create Vertex AI Tensorboard in for GCE, blank if running via XPK | |||
# Vertex AI supported regions: https://cloud.google.com/vertex-ai/docs/general/locations#available-regions | |||
vertex_tensorboard_region: "" | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not in this PR, but I see a need for separate inference specific config files in future -- both base.yml and model specific config.
210d7b2
to
38831be
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good
f522aba
to
50ae199
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remember to squash your commits.
3a3a1c5
to
c8ed884
Compare
- allow run specified stages - allow run specific prefill length(s) - delete prefill result - printout prefill result added funcs in max_utils
c8ed884
to
b46783c
Compare
Below are changes in this PR