-
Notifications
You must be signed in to change notification settings - Fork 4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-write parts of AEP-4016 (VPA with In-Place Updates) to align with the new In-Place Updates KEP changes #7813
Conversation
Skipping CI for Draft Pull Request. |
/assign maxcao13 |
@raywainman: GitHub didn't allow me to assign the following users: maxcao13. Note that only kubernetes members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
cc @maxcao13 |
@maxcao13 We can consider cutting the "disruption-less" updates as part of our initial MVP. This essentially means that we have the current Auto mode + in-place updates. Would love your thoughts :) |
/assign @laoj2 |
@raywainman: GitHub didn't allow me to assign the following users: laoj2. Note that only kubernetes members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
cc @jpawelczak |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for writing this up! Some comments and questions.
vertical-pod-autoscaler/enhancements/4016-in-place-updates-support/README.md
Outdated
Show resolved
Hide resolved
vertical-pod-autoscaler/enhancements/4016-in-place-updates-support/README.md
Show resolved
Hide resolved
vertical-pod-autoscaler/enhancements/4016-in-place-updates-support/README.md
Outdated
Show resolved
Hide resolved
vertical-pod-autoscaler/enhancements/4016-in-place-updates-support/README.md
Outdated
Show resolved
Hide resolved
vertical-pod-autoscaler/enhancements/4016-in-place-updates-support/README.md
Outdated
Show resolved
Hide resolved
In the `InPlaceOrRecreate` modes VPA updater will attempt to apply updates that require container | ||
restart in place. It will update them under the same conditions that would trigger update with | ||
`Recreate` mode. That is it will apply disruptive updates if: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say the updater should only attempt to apply these "disruptive" updates if the container restart resize policy is RestartContainer
. VPA shouldn't actively try to do this otherwise. At least that's what I understood from the previous AEP description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean: if the container restart resize policy is RestartContainer
, instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! My mistake, I've edited the original comment,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed on Slack, I don't think VPA needs to care about the resize policy, I think it should simply attempt the resize operation if the user set it that way and let the rest of the Kubernetes machinery take care of actually applying the update.
Does that sound OK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a small section about the interaction between VPA and the ResizePolicy
setting, hopefully that clarifies things a bit?
vertical-pod-autoscaler/enhancements/4016-in-place-updates-support/README.md
Show resolved
Hide resolved
vertical-pod-autoscaler/enhancements/4016-in-place-updates-support/README.md
Outdated
Show resolved
Hide resolved
vertical-pod-autoscaler/enhancements/4016-in-place-updates-support/README.md
Show resolved
Hide resolved
In the `InPlaceOrRecreate` modes VPA updater will attempt to apply updates that require container | ||
restart in place. It will update them under the same conditions that would trigger update with | ||
`Recreate` mode. That is it will apply disruptive updates if: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean: if the container restart resize policy is RestartContainer
, instead?
vertical-pod-autoscaler/enhancements/4016-in-place-updates-support/README.md
Outdated
Show resolved
Hide resolved
vertical-pod-autoscaler/enhancements/4016-in-place-updates-support/README.md
Outdated
Show resolved
Hide resolved
vertical-pod-autoscaler/enhancements/4016-in-place-updates-support/README.md
Outdated
Show resolved
Hide resolved
vertical-pod-autoscaler/enhancements/4016-in-place-updates-support/README.md
Outdated
Show resolved
Hide resolved
### 1. Applying Updates During Pod Admission | ||
|
||
For VPAs using the new `InPlaceOrRecreate` mode, the VPA Admission Controller will apply updates to | ||
starting pods just as it does for VPAs in `Initial`, `Auto`, and `Recreate` modes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not entirely sure if this AEP is the correct place to make this comment, but the promise of Auto
was that it would one day support in-place, see https://github.com/kubernetes/autoscaler/blob/78c8173b979316f892327022d53369760b000210/vertical-pod-autoscaler/docs/api.md#updatemode
We've been promising users that if they use Auto today, one day in the future the VPA will start resizing their pods for them without recreating them.
It seems that this may not be the case since the container runtime can't guarantee it.
Do we need to start thinking about what to do with Auto in the future?
Again, I'm unsure if this really fits into the context of this AEP, but I thought I'd bring it up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My 2 cents on this are that, switching Auto
to be InPlace
or some variation of InPlace
would be a breaking change that should be in something like VPA 2.0. It may not "break" someone's configuration, but the change probably won't be the same expected behaviour in VPA 0.x and 1.x when the updater was freely evicting pods in Auto
mode. Although, what I'm talking about is probably a discussion for the future timeline of VPA in general.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing the behavior of Auto, good idea. Major version bump to accompany that, also a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the discussion around changing the behavior of auto
, we're talking about a future where in-place vertical scaling is GA and not under a feature gate, right? IMHO it would feel a bit weird to make the default behavior of VPA depend on a beta feature gate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the discussion around changing the behavior of
auto
, we're talking about a future where in-place vertical scaling is GA and not under a feature gate, right?
Correct. Since the work on in-place is starting to happen now, and we are potentially diverging on the promise given to users, it seems appropriate to discuss Auto now, just so it's considered in our discussions as we move forward with the in-place rollout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specified in the AEP that the intention is that this eventually graduates into the Auto
mode.
Thanks all! I've applied changes based on the feedback and some discussions we've on Slack. Would love another look when folks have some time, thanks again! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Not instantly approving to give others a chance for a second look.
/hold EDIT: Oh, hold is already on this PR |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Thanks @raywainman!
vertical-pod-autoscaler/enhancements/4016-in-place-updates-support/README.md
Outdated
Show resolved
Hide resolved
vertical-pod-autoscaler/enhancements/4016-in-place-updates-support/README.md
Outdated
Show resolved
Hide resolved
vertical-pod-autoscaler/enhancements/4016-in-place-updates-support/README.md
Outdated
Show resolved
Hide resolved
vertical-pod-autoscaler/enhancements/4016-in-place-updates-support/README.md
Show resolved
Hide resolved
* If `CanEvict` is false. | ||
* If any of the `EvictionRequirements` on the VPA are not true. | ||
|
||
These additional resizes can be attempted because the eviction fallback would fail anyway. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just want to clarify this...
So an in-place update will be attempted, if that in-place fails, the VPA will not attempt an eviction?
Almost as if it was InPlaceOrNotAtAll
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will attempt an eviction but the eviction won't go through because the actual condition will prevent it from happening. Let me clarify that in the text.
These additional resizes can be attempted because the eviction fallback would fail anyway. | ||
|
||
The VPA updater will evict a pod to actuate a recommendation if it attempted to apply the | ||
recommendation in place and failed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two sentences seem to conflict with each other
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarified.
The idea here is that we can be slightly more relaxed with in-place resizes and still allow VPA to proceed with eviction when the operation fails. The eviction would get blocked anyway since these conditions themselves prevent eviction from happening. Hopefully that makes sense.
The content /lgtm I left some mostly minor comments, and one clarification on wording |
/lgtm Looks great, thanks @raywainman! |
@laoj2: changing LGTM is restricted to collaborators In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great and clears up much of my concerns with the previous iteration. Thanks so much @raywainman. I think we'll be able to close #7722 from this too.
cc @tallclair |
/lgtm Thank you! |
/lgtm for me too |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: omerap12, raywainman, voelzmo The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Thanks all! Will merge this now. We can make further updates in new PRs as needed. |
/unhold |
What type of PR is this?
/kind documentation
/kind feature
What this PR does / why we need it:
Many changes have happened to the In-Place Updates KEP since 2023, we need to refresh AEP-4016 to align with the new changes so that we can proceed with implementation.
Which issue(s) this PR fixes:
Part of #7673
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:
All information linked in the AEP.