-
Notifications
You must be signed in to change notification settings - Fork 387
Add Pathways Recipe Support for Scale Testing #1220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
38aed8e
to
4619286
Compare
59799d7
to
0c9e709
Compare
68bde9d
to
c344cd0
Compare
e5f9c2e
to
7888e57
Compare
ee2a336
to
72df4d8
Compare
72df4d8
to
c0b018d
Compare
c0b018d
to
b6c7658
Compare
e070b70
to
1c1dc1e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Sujeeth!
965eb9a
to
2279475
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Sujeeth!
c292d23
to
d2e9450
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review!
d2e9450
to
052f650
Compare
d40bff8
to
d1fdb9c
Compare
d1fdb9c
to
8eba24b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks so much!
8eba24b
to
a410df6
Compare
Enable metrics collection Revert "Enable metrics collection" This reverts commit b3d2fc0.
8d4b5ea
to
f164e8b
Compare
Closing as this PR has been merged in as part of another commit. |
Description
This PR contains the following changes:
Adds the following 2 recipes:
a. Benchmarking Pathways and McJAX code
b. Long running test on Pathways.
This is done to improve code/recipe sharing between developers during testing and make benchmarking simpler by
sharing a single, well-defined config. This will be used for testing and verification on v6e capacity.
By default allows all configs to run pathways (enable_single_controller, disable zarr3, enable_pathways_goodput etc.), with additional configs that can be added by the user with
pathways_tuning_params
.Allow removing problematic XLA flags for pathways with
pathways_xla_flag_options
or subsequently adding xla flags to different pods.Cleans up a lot of old pathways specific configs and adds some new ones as well.
Tests
Run multiple rounds on multiple v6e clusters
Checklist
Before submitting this PR, please make sure (put X in square brackets):