-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed hyperopt trial syncing to remote filesystems for Ray 2.0 #2617
Conversation
"executor": { | ||
TYPE: "ray", | ||
"num_samples": 1 if search_space == "grid" else RANDOM_SEARCH_SIZE, | ||
"max_concurrent_trials": 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tgaddair Is there a reason to set max_concurrent_trials to 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, when you run multiple trials on these runners / locally it often ends up making them compete for resources, which slows everything down. Limiting to 1 trial at a time helps to avoid this resource contention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, that makes sense! Similar to the problem I was seeing as well - personally I even noticed this happening on my local at times even when not using a RayBackend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good indicator that 1 CPU per trial is probably too low. In practice, we may want to bump this up.
Maybe this is not in the scope of this PR, but one of the other things we'll need to do is wrap writing some of the It might be worth either modifying an existing test (or adding a new one, although we should probably avoid that) to make sure we can also retrieve hyperopt_statistics.json from the same remote location that we sync to when running hyperopt E2E. |
@arnavgarg1 we do have a test for the existence of this file in the remote here. It's true that we'll need to do the |
Sounds good! This looks great, thanks @tgaddair! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving this for now, but we should probably land this once the plumbing PR is in as well!
Will avoid cherry-picking into the release branch until we have the credential PR in. |
This also revealed that there are issues with remote syncing for Ray 1.13, so we will only be supporting this feature with Ray 2.0 and above.
This PR uses the
RemoteSyncer
from #2386. The changes to support injecting credentials will come in a follow-up PR.