Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Webhook dependant resources are deployed before the related deployments are available #2009

Open
skonto opened this issue Mar 6, 2025 · 0 comments · May be fixed by #2010
Open

Webhook dependant resources are deployed before the related deployments are available #2009

skonto opened this issue Mar 6, 2025 · 0 comments · May be fixed by #2010
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@skonto
Copy link
Contributor

skonto commented Mar 6, 2025

Describe the bug

Right now a fresh install prints:

{"severity":"INFO","timestamp":"2025-03-06T08:31:02.4824378Z","logger":"knative-operator.manifestival","caller":"manifestival@v0.7.2/manifestival.go:208","message":"Updating","commit":"71d7c8c-dirty","knative.dev/pod":"knative-operator-5797f68df9-lmw8c","name":"knative-serving/routing-serving-certs","type":"networking.internal.knative.dev/v1alpha1, Kind=Certificate"}
{"severity":"ERROR","timestamp":"2025-03-06T08:31:02.677900778Z","logger":"knative-operator","caller":"knativeserving/reconciler.go:295","message":"Returned an error","commit":"71d7c8c-dirty","knative.dev/pod":"knative-operator-5797f68df9-lmw8c","knative.dev/controller":"knative.dev.operator.pkg.reconciler.knativeserving.Reconciler","knative.dev/kind":"operator.knative.dev.KnativeServing","knative.dev/traceid":"0f01d0c0-8482-4980-9129-651122f20dc3","knative.dev/key":"knative-serving/knative-serving","targetMethod":"ReconcileKind","error":"failed to apply non rbac manifest: Internal error occurred: failed calling webhook \"webhook.serving.knative.dev\": failed to call webhook: Post \"https://webhook.knative-serving.svc:443/?timeout=10s\": dial tcp 10.109.239.115:443: connect: connection refused","stacktrace":"knative.dev/operator/pkg/client/injection/reconciler/operator/v1beta1/knativeserving.(*reconcilerImpl).Reconcile\n\tknative.dev/operator/pkg/client/injection/reconciler/operator/v1beta1/knativeserving/reconciler.go:295\nknative.dev/pkg/controller.(*Impl).processNextWorkItem\n\tknative.dev/pkg@v0.0.0-20250117084104-c43477f0052b/controller/controller.go:540\nknative.dev/pkg/controller.(*Impl).RunContext.func3\n\tknative.dev/pkg@v0.0.0-20250117084104-c43477f0052b/controller/controller.go:489"}

Downstream we see a similar issue:

{"severity":"ERROR","timestamp":"2025-03-05T14:51:27.11460294Z","logger":"knative-operator","caller":"knativeserving/reconciler.go:295","message":"Returned an error","commit":"437a902","knative.dev/pod":"knative-operator-webhook-7458bff575-l6gm9","knative.dev/controller":"knative.dev.operator.pkg.reconciler.knativeserving.Reconciler","knative.dev/kind":"operator.knative.dev.KnativeServing","knative.dev/traceid":"bcd23919-595c-40e0-9728-f6f89ff570db","knative.dev/key":"knative-serving/knative-serving","targetMethod":"ReconcileKind","error":"failed to apply non rbac manifest: Internal error occurred: failed calling webhook \"webhook.serving.knative.dev\": failed to call webhook: Post \"https://webhook.knative-serving.svc:443/?timeout=10s\": no endpoints available for service \"webhook\"","stacktrace":"knative.dev/operator/pkg/client/injection/reconciler/operator/v1beta1/knativeserving.(*reconcilerImpl).Reconcile\n\t/workspace/vendor/knative.dev/operator/pkg/client/injection/reconciler/operator/v1beta1/knativeserving/reconciler.go:295\nknative.dev/pkg/controller.(*Impl).processNextWorkItem\n\t/workspace/vendor/knative.dev/pkg/controller/controller.go:540\nknative.dev/pkg/controller.(*Impl).RunContext.func3\n\t/workspace/vendor/knative.dev/pkg/controller/controller.go:489"}

This is a transient error but we can improve here also check additional context.

Expected behavior

  • Manifest install should enforce a proper order and webhook dependent resources should be deployed only after webhook services are ready.

To Reproduce

Install operator as usual and check the logs.
Knative release version
Latest

Additional context

This has caused issues in other scenarios where there was a non clean uninstall (validation webhook configs were still around). With a new install on top of the old, due to the validation on deletion (Serving has that for legacy reasons), the webhook was called for the Serving Certificate resources, even if there was no webhook deployment available. Thus any new install was blocked. I think we can make this more robust.

@skonto skonto added the kind/bug Categorizes issue or PR as related to a bug. label Mar 6, 2025
@skonto skonto linked a pull request Mar 6, 2025 that will close this issue
@skonto skonto changed the title Webhook dependant resources are deployed before deployments are available Webhook dependant resources are deployed before the related deployments are available Mar 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant