Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add policy Collect Debug Information for Pods in CrashLoopBackOff #1086

Merged

Conversation

nsagark
Copy link
Contributor

@nsagark nsagark commented Jul 26, 2024

Related Issue(s)

Description

Adds a new policy called "Collect Debug Information for Pods in CrashLoopBackOff" along with Chainsaw tests.

Checklist

  • I have read the policy contribution guidelines.
  • I have added test manifests and resources covering both positive and negative tests that prove this policy works as intended.
  • I have added the artifacthub-pkg.yml file and have verified it is complete and correct.

Signed-off-by: nsagark <sagar@nirmata.com>
Copy link
Contributor

@chipzoller chipzoller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this policy? There are no annotations which tell us anything here.

@nsagark
Copy link
Contributor Author

nsagark commented Jul 29, 2024

Hi @chipzoller I have added the annotations for this policy. Please take a look.

Copy link
Contributor

@chipzoller chipzoller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand what you're trying to go for here, but offering a policy which creates Pods based on some unknown container image in a cluster (i.e., not a Kyverno project image which can be inspected) seems like a questionable approach. I can see how this might be valuable, but I think this needs some more transparency to be trustworthy.

@nsagark
Copy link
Contributor Author

nsagark commented Jul 29, 2024

Hi @chipzoller Do you want me to push this image to ghcr.io or let me know if you have any suggestions?

@chipzoller
Copy link
Contributor

My suggestion would be to take your idea and create a public GitHub project repository from it and build/push the image there. Then, put this information in the description of the policy so users can inspect/evaluate what the proposed image does to determine if it's safe for them to consume this policy.

@nsagark
Copy link
Contributor Author

nsagark commented Jul 30, 2024

Hi @chipzoller I have added the information on how the image was built by adding a link in the description of the policy. Please review and let me know if we need to add anything else.

@nsagark
Copy link
Contributor Author

nsagark commented Aug 1, 2024

Hi @chipzoller Please let me know if anything else needed here.

Copy link
Contributor

@chipzoller chipzoller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like there are missing permissions here to run your Job. You're mounting a ServiceAccount token but aren't declaring a ServiceAccount name. And that SA has to be granted permissions to get Pods basically anywhere in the cluster. I don't see how this could possibly work without the user taking additional steps on their end, which isn't mentioned in the description.

other/get-debug-information/generate-policy.yaml Outdated Show resolved Hide resolved
other/get-debug-information/generate-policy.yaml Outdated Show resolved Hide resolved
other/get-debug-information/generate-policy.yaml Outdated Show resolved Hide resolved
other/get-debug-information/generate-policy.yaml Outdated Show resolved Hide resolved
other/get-debug-information/generate-policy.yaml Outdated Show resolved Hide resolved
other/get-debug-information/generate-policy.yaml Outdated Show resolved Hide resolved
other/get-debug-information/generate-policy.yaml Outdated Show resolved Hide resolved
@chipzoller chipzoller changed the title Updated files for get-debug-information policy Add policy Collect Debug Information for Pods in CrashLoopBackOff Aug 2, 2024
nsagark and others added 3 commits August 2, 2024 13:20
Co-authored-by: Chip Zoller <chipzoller@gmail.com>
Signed-off-by: nsagark <90008930+nsagark@users.noreply.github.com>
Co-authored-by: Chip Zoller <chipzoller@gmail.com>
Signed-off-by: nsagark <90008930+nsagark@users.noreply.github.com>
Co-authored-by: Chip Zoller <chipzoller@gmail.com>
Signed-off-by: nsagark <90008930+nsagark@users.noreply.github.com>
@nsagark
Copy link
Contributor Author

nsagark commented Aug 2, 2024

It seems like there are missing permissions here to run your Job. You're mounting a ServiceAccount token but aren't declaring a ServiceAccount name. And that SA has to be granted permissions to get Pods basically anywhere in the cluster. I don't see how this could possibly work without the user taking additional steps on their end, which isn't mentioned in the description.

For this comment, can I update the description like below? I have added the permissions needed in the README in the URL provided. Please let me know if that will suffice?

This policy generates a job which gathers troubleshooting data (including logs, kubectl describe output and events from the namespace) from pods that are in CrashLoopBackOff and have 3 restarts. This data can further be used to automatically create a Jira issue using some kind of automation or another Kyverno policy. For more information on the permissions needed to run this policy and the image used in this policy, see https://github.com/nirmata/SRE-Operational-Usecases/tree/main/get-troubleshooting-data/get-debug-data.

…hub-pkg.yml

Signed-off-by: nsagark <sagar@nirmata.com>
Signed-off-by: nsagark <sagar@nirmata.com>
Signed-off-by: nsagark <sagar@nirmata.com>
@nsagark
Copy link
Contributor Author

nsagark commented Aug 3, 2024

Hi @chipzoller I have made the changes that were suggested. Please take a look.

Copy link
Contributor

@chipzoller chipzoller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to the suggestion, why don't you add the service account field to the Job along with a comment so it's more clear the Job does nothing without the RBAC.

other/get-debug-information/collect-debug-information.yaml Outdated Show resolved Hide resolved
other/get-debug-information/artifacthub-pkg.yml Outdated Show resolved Hide resolved
nsagark and others added 2 commits August 6, 2024 23:31
Co-authored-by: Chip Zoller <chipzoller@gmail.com>
Signed-off-by: nsagark <90008930+nsagark@users.noreply.github.com>
@nsagark
Copy link
Contributor Author

nsagark commented Aug 7, 2024

Hi @chipzoller I have updated the files as per your suggestion, please take a look.

Copy link
Contributor

@chipzoller chipzoller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, just need to fix the Artifact Hub linting error which I think is due to policy filename mismatch.

other/get-debug-information/collect-debug-information.yaml Outdated Show resolved Hide resolved
other/get-debug-information/artifacthub-pkg.yml Outdated Show resolved Hide resolved
Signed-off-by: nsagark <sagar@nirmata.com>
@nsagark
Copy link
Contributor Author

nsagark commented Aug 9, 2024

Hi @chipzoller I have updated the files. Please take a look.

Signed-off-by: Chip Zoller <chipzoller@gmail.com>
@nsagark
Copy link
Contributor Author

nsagark commented Aug 9, 2024

@chipzoller I have added the changes that were requested. Please take a look.

Copy link
Contributor

@chipzoller chipzoller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to go now.

@chipzoller chipzoller merged commit d4c42e1 into kyverno:main Aug 9, 2024
279 of 280 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants