Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Apply MetalLB configuration to remote cluster #764

Closed

Conversation

dlipovetsky
Copy link
Contributor

@dlipovetsky dlipovetsky commented Jun 28, 2024

What problem does this PR solve?:

To apply MetalLB configuration, the MetalLB CRD webhooks must be ready. This is not guaranteed to happen once the MetalLB helm chart is successfully deployed.

Because the MetalLB configuration is applied in a non-blocking lifecycle hook, the topology controller retries the hook immediately after it returns; we cannot ask for a delay. Because the topology controller backs off its requests exponentially, this often results in the hook requests being delayed for 3+ minutes before succeeding.

To mitigate this exponentiall backoff, the hook retries internally for to up to 20 seconds, within the 30 seconds recommended by the Cluster API.

Related log excerpt
    I0628 19:39:31.051929       1 handler.go:132] "Deploying ServiceLoadBalancer provider MetalLB" cluster="default/dlipovetsky"
    I0628 19:39:31.051954       1 handler.go:82] "Applying MetalLB installation" cluster="default/dlipovetsky"
    I0628 19:39:31.100706       1 cm.go:82] "Fetching HelmChart info for \"metallb\" from configmap default/default-helm-addons-config" cluster="default/dlipovetsky"
    E0628 19:39:41.031497       1 handler.go:140] "failed to deploy ServiceLoadBalancer provider MetalLB" err="context canceled: last apply error: failed to apply MetalLB configuration IPAddressPool metallb-system/metallb: server-side apply failed: Patch \"https://172.18.0.2:6443/apis/metallb.io/v1beta1/namespaces/metallb-system/ipaddresspools/metallb?fieldManager=cluster-api-runtime-extensions-nutanix&fieldValidation=Strict&timeout=10s\": context canceled" cluster="default/dlipovetsky"
    ...
    E0628 19:43:49.496027       1 handler.go:140] "failed to deploy ServiceLoadBalancer provider MetalLB" err="context canceled: last apply error: failed to apply MetalLB configuration IPAddressPool metallb-system/metallb: server-side apply failed: Internal error occurred: failed calling webhook \"ipaddresspoolvalidationwebhook.metallb.io\": failed to call webhook: Post \"https://metallb-webhook-service.metallb-system.svc:443/validate-metallb-io-v1beta1-ipaddresspool?timeout=10s\": dial tcp 10.130.84.48:443: connect: connection refused" cluster="default/dlipovetsky"
    I0628 19:43:51.785907       1 handler.go:132] "Deploying ServiceLoadBalancer provider MetalLB" cluster="default/dlipovetsky"
    I0628 19:43:51.785923       1 handler.go:82] "Applying MetalLB installation" cluster="default/dlipovetsky"
    I0628 19:43:51.790521       1 cm.go:82] "Fetching HelmChart info for \"metallb\" from configmap default/default-helm-addons-config" cluster="default/dlipovetsky"

Which issue(s) this PR fixes:
Fixes #

How Has This Been Tested?:

Special notes for your reviewer:

Implements a wait for a check to pass against a typed object.
@jimmidyson
Copy link
Member

Closing as merged in #783.

@jimmidyson jimmidyson closed this Jul 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants