fix: Fix regression in Helm post-renderer error handling#9997
fix: Fix regression in Helm post-renderer error handling#9997psliwka wants to merge 1 commit intoGoogleContainerTools:mainfrom
Conversation
In Skaffold v1, user-provided Helm post-renderers (configured by adding an appropriate `--post-renderer` flag to `deploy.helm.flags` in skaffold.yaml) used to be processed by Helm itself. In case a post-renderer crashed and returned a non-zero exit code, Helm detected that and aborted deployment process. This was an important safety measure, since a failing post-renderer is very likely to output incorrect manifests, or no manifests at all. In Skaffold v2, things work differently: Skaffold injects itself as Helm's one and only post-renderer, and any custom post-renderer specified in skaffold.yaml is being executed from within `skaffold filter` subprocess, invoked by Helm. The Skaffold's implementation of handling post-renderers is mostly similar to the Helm's one, except one critical detail: the post-renderer's exit code is never checked, causing its output to be blindly consumed by the deployment pipeline, even in case the post-renderer crashes and outputs nothing to its stdout. In practice, this usually causes a seemingly successful deployment reported by Skaffold, while all Kubernetes objects deployed by a previous (successful) release are silently destroyed, causing unexpected service outage. This commit fixes the issue by by bringing `skaffold filter` behavior in-line to the Helm's (and thus also Skaffold v1's) one, i.e. a crash of post-renderer stops the deployment pipeline, and a meaningful error message is returned to the user to help with debugging the issue. NB While I'm not the biggest fan of the error message's format, I intentionally made it to be exactly the same as its Helm counterpart, which I think it's only fair given that we replicate Helm's native behavior here. Fixes GoogleContainerTools#9377.
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves a significant regression in Skaffold v2's handling of Helm post-renderers. Previously, a crashing post-renderer would not halt the deployment process, potentially leading to silent data destruction. The changes ensure that Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request correctly identifies and fixes a regression where the exit code of a Helm post-renderer was not being checked. The addition of cmd.Wait() and checking its error is the right approach. However, the current implementation has a potential deadlock issue if manifest parsing fails mid-stream. I've suggested a more robust implementation that buffers the output to prevent this. The added test case is good and covers the failure scenario well.
In Skaffold v1, user-provided Helm post-renderers (configured by adding an appropriate
--post-rendererflag todeploy.helm.flagsin skaffold.yaml) used to be processed by Helm itself. In case a post-renderer crashed and returned a non-zero exit code, Helm detected that and aborted deployment process. This was an important safety measure, since a failing post-renderer is very likely to output incorrect manifests, or no manifests at all.In Skaffold v2, things work differently: Skaffold injects itself as Helm's one and only post-renderer, and any custom post-renderer specified in skaffold.yaml is being executed from within
skaffold filtersubprocess, invoked by Helm. The Skaffold's implementation of handling post-renderers is mostly similar to the Helm's one, except one critical detail: the post-renderer's exit code is never checked, causing its output to be blindly consumed by the deployment pipeline, even in case the post-renderer crashes and outputs nothing to its stdout. In practice, this usually causes a seemingly successful deployment reported by Skaffold, while all Kubernetes objects deployed by a previous (successful) release are silently destroyed, causing unexpected service outage.This commit fixes the issue by by bringing
skaffold filterbehavior in-line to the Helm's (and thus also Skaffold v1's) one, i.e. a crash of post-renderer stops the deployment pipeline, and a meaningful error message is returned to the user to help with debugging the issue.NB While I'm not the biggest fan of the error message's format, I intentionally made it to be exactly the same as its Helm counterpart, which I think it's only fair given that we replicate Helm's native behavior here.
Fixes #9377.