Added default tolerations. #41

musa-asad · 2024-05-15T16:31:36Z

Description of changes:
As indicated in aws/containers-roadmap#2195, Amazon CloudWatch Observability EKS add-on currently does not have default tolerations for cloudwatch-agent and fluent-bit daemonsets, which means tainted nodes won't run cloudwatch-agent and fluent-bit. I simply updated the deployments and daemonsets to have default tolerations and the ability for customers to override this.

Test output:

Nodes:

% kubectl get nodes                                     
NAME                             STATUS   ROLES    AGE   VERSION
ip-192-168-33-152.ec2.internal   Ready    <none>   8h    v1.29.3-eks-ae9a62a

Taint:

% kubectl taint nodes ip-192-168-33-152.ec2.internal key=value:NoSchedule
node/ip-192-168-33-152.ec2.internal tainted

When running helm upgrade --install amazon-cloudwatch-observability helm-charts/charts/amazon-cloudwatch-observability --values helm-charts/charts/amazon-cloudwatch-observability/values.yaml --set clusterName=my-cluster --set region=us-east-1 --set 'tolerations[0].operator=Exists' --set 'tolerations[0].effect=NoExecute':

% kubectl get pods -o wide
NAME                                                              READY   STATUS    RESTARTS   AGE   IP              NODE                             NOMINATED NODE   READINESS GATES
amazon-cloudwatch-observability-controller-manager-6df65767gwnt   1/1     Running   0          48m   192.168.38.37   ip-192-168-33-152.ec2.internal   <none>           <none>

When running helm upgrade --install amazon-cloudwatch-observability helm-charts/charts/amazon-cloudwatch-observability --values helm-charts/charts/amazon-cloudwatch-observability/values.yaml --set clusterName=my-cluster --set region=us-east-1:

% kubectl get pods -o wide
NAME                                                              READY   STATUS    RESTARTS   AGE   IP               NODE                             NOMINATED NODE   READINESS GATES
amazon-cloudwatch-observability-controller-manager-6df65767gwnt   1/1     Running   0          50m   192.168.38.37    ip-192-168-33-152.ec2.internal   <none>           <none>
cloudwatch-agent-2s4td                                            1/1     Running   0          56s   192.168.49.133   ip-192-168-33-152.ec2.internal   <none>           <none>
dcgm-exporter-47gpz                                               1/1     Running   0          56s   192.168.46.197   ip-192-168-33-152.ec2.internal   <none>           <none>
fluent-bit-gdtkn                                                  1/1     Running   0          56s   192.168.33.152   ip-192-168-33-152.ec2.internal   <none>           <none>

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

mitali-salvi

Could you add steps in the PR overview on how was this changes tested ?

musa-asad · 2024-05-16T18:23:06Z

Could you add steps in the PR overview on how was this changes tested ?

Adding.

charts/amazon-cloudwatch-observability/templates/linux/cloudwatch-agent-daemonset.yaml

charts/amazon-cloudwatch-observability/values.yaml

Accidentally approved

…pacing.

mitali-salvi

Why is the indentation different for every yaml ? For Neuron monitor its 2 but for the daemon-sets its 6 ?

charts/amazon-cloudwatch-observability/crds/cloudwatch.aws.amazon.com_dcgmexporters.yaml

musa-asad · 2024-05-22T13:46:52Z

Why is the indentation different for every yaml ? For Neuron monitor its 2 but for the daemon-sets its 6 ?

This was because the indentation of the relevant spec was great for the other daemon-sets as opposed to neuron monitor. For instance, volumes:

  volumes:

and

      volumes:

wonko · 2024-06-07T13:53:33Z

I believe this resulted in the daemonsets trying to schedule onto fargate nodes, which will never work. This breaks the addon upgrade, as the daemonset never rolls out completely.

Added default tolerations.

c20c524

musa-asad requested review from mitali-salvi and sky333999 May 15, 2024 16:31

musa-asad self-assigned this May 15, 2024

mitali-salvi reviewed May 16, 2024

View reviewed changes

musa-asad added 2 commits May 16, 2024 14:14

Fixed formatting.

e035546

Fixed formatting for fluent-bit.

7d5f8c3

musa-asad added 2 commits May 16, 2024 15:41

Added ability for customers to override default tolerations.

3727ea8

Consistent wording.

06c7108

sky333999 previously approved these changes May 16, 2024

View reviewed changes

charts/amazon-cloudwatch-observability/templates/linux/cloudwatch-agent-daemonset.yaml Outdated Show resolved Hide resolved

charts/amazon-cloudwatch-observability/values.yaml Show resolved Hide resolved

musa-asad added 2 commits May 17, 2024 01:07

Addressed comments: added to all deployments and daemonsets & fixed s…

5a4d29b

…pacing.

Added tolerations to CRDs.

0aa75b8

musa-asad mentioned this pull request May 20, 2024

Added tolerations support for dcgmexporter and neuronmonitor. aws/amazon-cloudwatch-agent-operator#174

Merged

Fixed formatting.

24fd1ba

mitali-salvi reviewed May 21, 2024

View reviewed changes

sky333999 reviewed May 21, 2024

View reviewed changes

charts/amazon-cloudwatch-observability/crds/cloudwatch.aws.amazon.com_dcgmexporters.yaml Outdated Show resolved Hide resolved

Fixed comments.

31604ff

sky333999 approved these changes May 22, 2024

View reviewed changes

mitali-salvi approved these changes May 22, 2024

View reviewed changes

musa-asad merged commit 32e8402 into main May 22, 2024
3 checks passed

musa-asad deleted the default-tolerations branch May 22, 2024 21:44

bushong1 mentioned this pull request May 29, 2024

Breaking changes should be a major version bump #50

Open

wonko mentioned this pull request Jun 7, 2024

Tries to schedule on fargate nodes aws/amazon-cloudwatch-agent-operator#183

Closed

Paramadon mentioned this pull request Jul 2, 2024

EKS Addon Fargate Bug Fix #58

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added default tolerations. #41

Added default tolerations. #41

musa-asad commented May 15, 2024 •

edited

Loading

mitali-salvi left a comment

musa-asad commented May 16, 2024

mitali-salvi left a comment

musa-asad commented May 22, 2024 •

edited

Loading

wonko commented Jun 7, 2024

Added default tolerations. #41

Added default tolerations. #41

Conversation

musa-asad commented May 15, 2024 • edited Loading

mitali-salvi left a comment

Choose a reason for hiding this comment

musa-asad commented May 16, 2024

mitali-salvi left a comment

Choose a reason for hiding this comment

musa-asad commented May 22, 2024 • edited Loading

wonko commented Jun 7, 2024

musa-asad commented May 15, 2024 •

edited

Loading

musa-asad commented May 22, 2024 •

edited

Loading