Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable taint effect #273

Closed
hamishforbes opened this issue Nov 2, 2020 · 9 comments · Fixed by #539
Closed

Configurable taint effect #273

hamishforbes opened this issue Nov 2, 2020 · 9 comments · Fixed by #539
Assignees
Labels
Priority: Low This issue will not be seen by most users. The issue is a very specific use case or corner case Status: Work in Progress Type: Enhancement New feature or request

Comments

@hamishforbes
Copy link
Contributor

It would be great if the taint applied to nodes when they are due to terminate could be configured as NoExecute.

Currently the taint is only really useful for informational purposes as cordoning the node already applies a NoSchedule taint.

I'd like to be able to set terminating nodes as NoExecute in order to stop daemonset pods gracefully (specifically to allow Consul agents to leave the cluster)
Similar to kubernetes/kubernetes#75482

@bwagner5
Copy link
Contributor

bwagner5 commented Nov 2, 2020

I think this makes sense to add. We're getting a lot of configuration, so we probably need to refactor some of the config codebase to expose it in a better way than what is currently there.

@bwagner5 bwagner5 added the Type: Enhancement New feature or request label Nov 2, 2020
@hoelzro
Copy link

hoelzro commented Jan 6, 2021

@bwagner5 I'm interested in putting in the work to add this - did that config refactor work you mentioned already happen? Also, how does specifying the effect (eg. NoExecute) via an environment variable TAINT_EFFECT sound to you?

@bwagner5
Copy link
Contributor

bwagner5 commented Jan 6, 2021

@hoelzro It would be great if you could work on this! The configuration refactor has not happened yet... We broke out the configuration docs into common config, IMDS, and Queue-Processor sections which helps a bit, but in the code, it's all one giant config.

Adding the TAINT_EFFECT env var and corresponding --taint-effect flag sounds simple enough and would provide the necessary functionality.


Mainly thinking out loud here to gather all of the context around configuration that may happen in the future so that we don't have to rework how this taint config works

There is a request in #254 about configuring specific nodes via annotations as to what NTH should do with it. I'm not sure if the TAINT_EFFECT should just be global or if it should be one of the values that a node can be configured to drain with. If it's global and the annotation should be simple like @olemarkus mentioned in the issue of two possible options DRAIN or CORDON, then the implementation you suggested will work well with no rework required when we implement the configuration by node annotations.

Even if one of the two options in a node annotated configuration is TAINT_NOEXECUTE, the global config still makes sense as a default for nodes not annotated. It may even be advantageous to allow the node annotations to store a list of actions so that a user could say: node1: CORDON, DRAIN, TAINT_NOEXECUTE

Related to NTH configuration by node annotations, we've discussed another option for configuring actions based on event type. The Queue-processor mode of NTH supports a lot more event types than IMDS mode and it might be nice to say things like:

If a Rebalance Recommendation event is received,
Cordon the node,
but if a Spot Interruption Notice event is received,
drain it

This type of configuration would likely be global and will probably add a lot of config key-values to further bloat the configuration. But I think it's safe to say TAINT_EFFECT would live outside of this config anyways, so it's probably okay to ignore this entirely.


@hoelzro
Copy link

hoelzro commented Jan 7, 2021

@bwagner5 Thanks for the extra context! So if I just added TAINT_EFFECT and --taint-effect for now, that wouldn't impede the broader work you'd like to see in the future, right?

Also, I saw that the CLA was recently removed - how does contribution work with that in mind? Do I just need to confirm that licensing my commits' changes under Apache is ok?

@bwagner5
Copy link
Contributor

bwagner5 commented Jan 7, 2021

So if I just added TAINT_EFFECT and --taint-effect for now, that wouldn't impede the broader work you'd like to see in the future, right?

Yep, TAINT_EFFECT and --taint-effect should be good 👍

Also, I saw that the CLA was recently removed - how does contribution work with that in mind? Do I just need to confirm that licensing my commits' changes under Apache is ok?

Yes, there's a line in the PR template: "By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license."

@bwagner5
Copy link
Contributor

bwagner5 commented Feb 8, 2021

Hey @hoelzro, just wanted to check in and see if you were still planning on contributing the taint-effect?

@hoelzro
Copy link

hoelzro commented Feb 8, 2021

Hi @bwagner5! It's still on my list - other stuff just came up and knocked it onto my backburner!

@jillmon jillmon added Priority: High This issue will be seen by most users Status: Work in Progress labels Feb 11, 2021
@jillmon
Copy link
Contributor

jillmon commented Oct 19, 2021

Hello @hoelzro, is this still on your list?

@hoelzro
Copy link

hoelzro commented Oct 19, 2021

I'm still interested in working on it, but I just got back from leave, and the issue still hasn't bubbled up to the top of my backlog, I'm afraid :(

@jillmon jillmon added Priority: Low This issue will not be seen by most users. The issue is a very specific use case or corner case and removed Priority: High This issue will be seen by most users labels Nov 17, 2021
hamishforbes added a commit to hamishforbes/aws-node-termination-handler that referenced this issue Nov 24, 2021
Add TAINT_EFFECT env var and --taint-effect flag to configure the node taint effect
Defaults to NoSchedule, garbage values will log a warning and default to NoSchedule.

closes  aws#273
bwagner5 pushed a commit that referenced this issue Dec 15, 2021
Add TAINT_EFFECT env var and --taint-effect flag to configure the node taint effect
Defaults to NoSchedule, garbage values will log a warning and default to NoSchedule.

closes  #273
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: Low This issue will not be seen by most users. The issue is a very specific use case or corner case Status: Work in Progress Type: Enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants