Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added default tolerations. #41

Merged
merged 9 commits into from
May 22, 2024
Merged

Added default tolerations. #41

merged 9 commits into from
May 22, 2024

Conversation

musa-asad
Copy link
Collaborator

@musa-asad musa-asad commented May 15, 2024

Description of changes:
As indicated in aws/containers-roadmap#2195, Amazon CloudWatch Observability EKS add-on currently does not have default tolerations for cloudwatch-agent and fluent-bit daemonsets, which means tainted nodes won't run cloudwatch-agent and fluent-bit. I simply updated the deployments and daemonsets to have default tolerations and the ability for customers to override this.

Test output:

Nodes:

% kubectl get nodes                                     
NAME                             STATUS   ROLES    AGE   VERSION
ip-192-168-33-152.ec2.internal   Ready    <none>   8h    v1.29.3-eks-ae9a62a

Taint:

% kubectl taint nodes ip-192-168-33-152.ec2.internal key=value:NoSchedule
node/ip-192-168-33-152.ec2.internal tainted

When running helm upgrade --install amazon-cloudwatch-observability helm-charts/charts/amazon-cloudwatch-observability --values helm-charts/charts/amazon-cloudwatch-observability/values.yaml --set clusterName=my-cluster --set region=us-east-1 --set 'tolerations[0].operator=Exists' --set 'tolerations[0].effect=NoExecute':

% kubectl get pods -o wide
NAME                                                              READY   STATUS    RESTARTS   AGE   IP              NODE                             NOMINATED NODE   READINESS GATES
amazon-cloudwatch-observability-controller-manager-6df65767gwnt   1/1     Running   0          48m   192.168.38.37   ip-192-168-33-152.ec2.internal   <none>           <none>

When running helm upgrade --install amazon-cloudwatch-observability helm-charts/charts/amazon-cloudwatch-observability --values helm-charts/charts/amazon-cloudwatch-observability/values.yaml --set clusterName=my-cluster --set region=us-east-1:

% kubectl get pods -o wide
NAME                                                              READY   STATUS    RESTARTS   AGE   IP               NODE                             NOMINATED NODE   READINESS GATES
amazon-cloudwatch-observability-controller-manager-6df65767gwnt   1/1     Running   0          50m   192.168.38.37    ip-192-168-33-152.ec2.internal   <none>           <none>
cloudwatch-agent-2s4td                                            1/1     Running   0          56s   192.168.49.133   ip-192-168-33-152.ec2.internal   <none>           <none>
dcgm-exporter-47gpz                                               1/1     Running   0          56s   192.168.46.197   ip-192-168-33-152.ec2.internal   <none>           <none>
fluent-bit-gdtkn                                                  1/1     Running   0          56s   192.168.33.152   ip-192-168-33-152.ec2.internal   <none>           <none>

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@musa-asad musa-asad self-assigned this May 15, 2024
Copy link
Contributor

@mitali-salvi mitali-salvi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add steps in the PR overview on how was this changes tested ?

@musa-asad
Copy link
Collaborator Author

Could you add steps in the PR overview on how was this changes tested ?

Adding.

@sky333999 sky333999 dismissed their stale review May 16, 2024 22:01

Accidentally approved

Copy link
Contributor

@mitali-salvi mitali-salvi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the indentation different for every yaml ? For Neuron monitor its 2 but for the daemon-sets its 6 ?

@musa-asad
Copy link
Collaborator Author

musa-asad commented May 22, 2024

Why is the indentation different for every yaml ? For Neuron monitor its 2 but for the daemon-sets its 6 ?

This was because the indentation of the relevant spec was great for the other daemon-sets as opposed to neuron monitor. For instance, volumes:

  volumes:

and

      volumes:

@musa-asad musa-asad merged commit 32e8402 into main May 22, 2024
3 checks passed
@musa-asad musa-asad deleted the default-tolerations branch May 22, 2024 21:44
@wonko
Copy link

wonko commented Jun 7, 2024

I believe this resulted in the daemonsets trying to schedule onto fargate nodes, which will never work. This breaks the addon upgrade, as the daemonset never rolls out completely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants