-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ingress-nginx Pod Prematurely Marked Ready, Causing HTTP 404 Errors from default backend routing #12206
Comments
This issue is currently awaiting triage. If Ingress contributors determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/remove-kind bug The bug label can be re-applied after a developer accepts that the triage as a bug. There are multiple pieces of information to look at ;
|
Also, there have been attempts to play with various timeouts by other users who reported similar expectation. But since that is a extremely specific config for each environment and each use-case of a given-environment, I think that is one area to explore. |
Actually, this issue impacts us in our specific case where we're not using the service cluster IP, but rather the external load balancer is hitting the pod IP directly through the GCP LB NEG. It's calling the same health check as kubernetes, and the ingress-nginx-controller is returning erroneously once again that it is ready, when it has not yet loaded the ingress config. The problem is that the "initial sync" of the kubernetes state for ingresses isn't completed until after the controller reports that the initial sync is complete and the dynamic loadbalancer is initialized. The k8s Service implementation for the ingress or backends are not relevant in this case. 404 is being returned by nginx because the appropriate ingress isn't populated in the lua shared dictionary nor are the servers/locations templated in the nginx configuration. |
Actually, we don't face any problems when a reload of nginx occurs. Rather, multiple reloads on startup of ingress-nginx pods are merely a symptom of the specific implementation in ingress-nginx-controller and it's interaction with the kubernetes API (the Golang side) resulting in the bug. |
@Izzette thank you for the comments. Helps. This is a vast topic so discussions are only possible with specifics so I am looking at your reproduce steps. And we are rootcausing the 404
So above comments from me are directed at the tests but even with any other tests, if there is starvation of resources like cpu/mem etc etc that I listed earlier, there will be events leading to the endpointSlice becoming empty. It is expected. Even though the results will be the same, I would do these tests by first installing Metallb in the kind cluster. And specifying the docker container ipaddress as the starting and ending of the ipaddress pool. Then I would make /etc/hosts entry for hello-700 on the host running kind. And send curl request from the host shell to hello-700.example.com. Simulates your use-case closer. (not that results will be any different though). lastly, to repeat, if you starve the cluster of cpu/mem/bandwidth/conntrack/inodes & i/o and also generate load on api-server and top it with a rollout, the /healthz endpoint of the controller may respond ok and thus move pod to ready state. I am not surprised. And the only choice we have at this time is a play on the timeouts, specifically increasing the delaySeconds etc etc. And all other configurables related to probe behaviour. |
And just for sanity sake, the tests will be the same if you use |
Ah forgot to mention, I also have interest to use 5 replicas and set min-available to 3. Then do the load and rollout as per your design. |
I am able to reproduce with
while I run in a different shell:
Before the rollout restart, this works fine of course, as with the other backend. if I redeploy ingress-nginx with replicas 5 and maxUnavailable 2 I can also reproduce this issue:
While in my test pod:
|
hi, its so very helpful when the data is so abundant and precise. I have a ton of things to communicate here after this data. But I was wondering if we can get on a screenshare to get the precise data, that is kind of more relatable, from the perspective of creating action-items for developers of this project. Any chance you can meet on meet.jit.si |
I am also on slack, if that works for you. The advantage being realtime conversation possible if it adds value. |
What happened:
Upon ingress-nginx pod boot-up sequence on Kubernetes, our clients are receiving back HTTP 404 responses from nginx itself for HTTP path that are declared in some of our Ingresses. This situation only happens when the pod is booting up and not when a hot reload sequence is initiated.
While the pod is marked as Ready in Kubernetes, we suspect that the nginx configuration is not fully loaded and some of the requests are forwarded to the upstream-default-backend upstream (see screenshot below and the pod logs in CSV).
For reference we defined quite a lot of Ingresses in our cluster with a lot of different paths. The resulting nginx configuration is quite heavy to load as its approximately 67MB.
Requests served by the default backend by each pod just after it starts up
Count of pods in “ready” state
You can see in the above two graphs that after 3 out of the 4 pods in the ingress-nginx-external-controller-7c8576cd ReplicaSet become Ready (ingress-nginx-controller /healthz endpoint returns 200) several thousand requests are served by the default backend over the course of ~30s. This occurs even after the 10s initial delay for the readiness and liveness probes has been surpassed.
ingress-nginx-controller pod logs after startup. Notice the multiple reloads and the change of backend after the last reload.
What you expected to happen:
The Ingress-nginx pod should not be marked as ready while still loading its configuration and we should not get HTTP 404 from nginx itself.
NGINX Ingress controller version
Kubernetes version (use kubectl version): Server Version: v1.30.5-gke.1014001
Environment:
Cloud provider or hardware configuration: Google Cloud Platform / GKE / GCE
OS (e.g. from /etc/os-release): https://cloud.google.com/container-optimized-os/docs/release-notes/m113#cos-113-18244-151-14_
Kernel (e.g. uname -a): https://cos.googlesource.com/third_party/kernel/+/f2b7676b27982b8ce21e62319fceb9a0fd4131c5
Install tools: GKE
Basic cluster related info:
How was the ingress-nginx-controller installed:
Current State of the controller:
`kubectl describe ingressclasses
kubectl -n <ingresscontrollernamespace> get all -A -o wide
kubectl -n <ingresscontrollernamespace> describe po <ingresscontrollerpodname>
kubectl -n <ingresscontrollernamespace> describe svc <ingresscontrollerservicename>
How to reproduce this issue:
Install a kind cluster
Install the ingress controller
Install the ingress controller with modified liveness/readiness timings to improve the reproducibility.
Admission webhooks are disabled here to avoid swamping ingress-nginx when creating the large number of ingresses required to reproduce this bug.
Create a simple service in the default namespace
Here we're creating a simple service using nc that will always return 200.It's not protocol aware, just returning a static body.
Apply the below manifests with:
Manifests
Create 1k ingresses pointing to this service
Run the below python script to create 1000 ingresses (hello-[0-999].example.com) with our simple service as the backend.
You will have to wait some time for ingress-nginx to update it's config with all these changes.
Create a test pod to confirm the service and ingress are alive.
kubectl run --namespace default --context kind-kind --tty --stdin --restart=Never --command --image nicolaka/netshoot:latest test -- bash
In the console on this pod, run the following to confirm we have a stable environment:
curl --verbose hello.default.svc.cluster.local. curl --verbose --header 'Host: hello-999.example.com' ingress-nginx-controller.ingress-nginx.svc.cluster.local.
You should see a successful response for each of them.
Generate load on the kubernetes API server / etcd
In order to reproduce this bug (reliably?) some load needs to be added to kubernetes itself.
I use kube-burner here to create the load.
You will need to wait until PUT/PATCH/DELETE commands are being run on existing kubernetes objects in order to reproduce.
Below is my configuration.
During the first job, objects are created, and this doesn't seem to be enough to reproduce the bug.
Wait until the api-intensive-patch job has started before continuing.
Configuration
./api-intensive.yml
:./templates/deployment.yaml
:./templates/deployment_patch_add_pod_2.yaml
:./templates/service.yaml
:./templates/deployment_patch_add_label.yaml
:./templates/deployment_patch_add_label.json
:./templates/configmap.yaml
:./templates/secret.yaml
:GET constantly the endpoint
In the test pod we created earlier, run the following command which will curl the hello-700.example.com ingress constantly until 404 is returned.
While this is running, in a different shell, perform a rollout restart of the ingress-nginx controller.
It may take a couple of attempts of rolling out the deployment, but eventually you should see the loop in the test pod break and something similar to the following stderr and body printed:
Interestingly you can't see the 404 in the logs of either the new or old nginx pod in this reproduction.
This is different than what we see in our production cluster, where the 404s are present in the logs in the newly created ingress-nginx pod.
Anything else we need to know:
Here's a breakdown of what I think some of the details around the root-cause of this bug in the code are:
The health check here basically just checks that nginx is running (which it will be very early on) and that the /is-dynamic-lb-initialized path returns with 2xx:
ingress-nginx/internal/ingress/controller/checker.go
Lines 63 to 66 in 0edf16f
The is-dynamic-lb-initialized location is handled by this Lua module, which is just checking if any backends are configured.
ingress-nginx/rootfs/etc/nginx/lua/nginx/ngx_conf_is_dynamic_lb_initialized.lua
Lines 2 to 6 in 05eda3d
This will basically always be true after the first reload, as as soon as there is at least one ingress in the cache, it will detect a difference and trigger a reload with the new backends configuration:
ingress-nginx/internal/ingress/controller/controller.go
Lines 195 to 221 in 05eda3d
Which calls the OnUpdate here:
ingress-nginx/internal/ingress/controller/nginx.go
Line 749 in 0edf16f
My suspicion is that the cached ingresses in k8sStore do not represent the full state initially, but rather include ingresses returned from the API server in the first paginated response(s):
ingress-nginx/internal/ingress/controller/store/store.go
Line 1102 in 05eda3d
This can be validated by inspecting the number of ingresses returned by this function during startup when a large number of ingresses are present.
A potential solution would be to reject a reload of Nginx until we're sure that the cache is fully populated on the initial sync.
The text was updated successfully, but these errors were encountered: