-
Couldn't load subscription status.
- Fork 643
[Bug] Sidecar mode shouldn't restart head pod when head pod is deleted #4141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: 400Ping <fourhundredping@gmail.com>
Signed-off-by: 400Ping <fourhundredping@gmail.com>
|
@400Ping, the change should be made in the raycluster_controller. We need to make the raycluster_controller not recreate the head pod if the cluster belongs to a RayJob, so that we can avoid races where the raycluster_controller recreates the head before the rayjob_controller checks it. |
ok, thanks |
Signed-off-by: 400Ping <fourhundredping@gmail.com>
Signed-off-by: 400Ping <fourhundredping@gmail.com>
| originatedFrom := utils.GetCRDType(instance.Labels[utils.RayOriginatedFromCRDLabelKey]) | ||
| if originatedFrom == utils.RayJobCRD { | ||
| logger.Info( | ||
| "reconcilePods: Found 0 head Pods for a RayJob-managed RayCluster; skipping head creation to let RayJob controller handle the failure", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this cause no head pod to be created at all? We still need to create the first head pod. I think you can check the RayClusterProvisioned condition to decide whether to create one or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is related to the flaky test in CI now.
https://buildkite.com/ray-project/ray-ecosystem-ci-kuberay-ci/builds/11427#019a2849-e063-47f4-8aef-9143855d8976
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's try to fix this, super important.
Why are these changes needed?
When using sidecar mode, the head pod should not be recreated after it is deleted. The RayJob should be marked as Failed.
Related issue number
Closes #4130
Checks