|  | 
|  | 1 | +# Results | 
|  | 2 | + | 
|  | 3 | +## Test environment | 
|  | 4 | + | 
|  | 5 | +NGINX Plus: false | 
|  | 6 | + | 
|  | 7 | +NGINX Gateway Fabric: | 
|  | 8 | + | 
|  | 9 | +- Commit: e4eed2dad213387e6493e76100d285483ccbf261 | 
|  | 10 | +- Date: 2025-10-17T14:41:02Z | 
|  | 11 | +- Dirty: false | 
|  | 12 | + | 
|  | 13 | +GKE Cluster: | 
|  | 14 | + | 
|  | 15 | +- Node count: 3 | 
|  | 16 | +- k8s version: v1.33.5-gke.1080000 | 
|  | 17 | +- vCPUs per node: 2 | 
|  | 18 | +- RAM per node: 4015668Ki | 
|  | 19 | +- Max pods per node: 110 | 
|  | 20 | +- Zone: europe-west2-a | 
|  | 21 | +- Instance Type: e2-medium | 
|  | 22 | + | 
|  | 23 | +## Summary: | 
|  | 24 | + | 
|  | 25 | +- Still a lot of non-2xx or 3xx responses, but vastly improved on the last test run. | 
|  | 26 | +- This indicates that while most of the Agent - control plane connection issues have been resolved, some issues remain. | 
|  | 27 | +- All the observed 502s happened within the one window of time, which at least indicates the system was able to recover - although it is unclear what triggered Agent  | 
|  | 28 | +- The increase in memory usage for NGF seen in the previous test run appears to have been resolved. | 
|  | 29 | +- We observe a steady increase in NGINX memory usage over time which could indicate a memory leak. | 
|  | 30 | +- CPU usage remained consistent with past results.  | 
|  | 31 | +- Errors seem to be related to cluster upgrade or some other external factor (excluding the resolved inferences pool status error). | 
|  | 32 | + | 
|  | 33 | +## Traffic | 
|  | 34 | + | 
|  | 35 | +HTTP: | 
|  | 36 | + | 
|  | 37 | +```text | 
|  | 38 | +Running 5760m test @ http://cafe.example.com/coffee | 
|  | 39 | +  2 threads and 100 connections | 
|  | 40 | +  Thread Stats   Avg      Stdev     Max   +/- Stdev | 
|  | 41 | +    Latency   202.19ms  150.51ms   2.00s    83.62% | 
|  | 42 | +    Req/Sec   272.67    178.26     2.59k    63.98% | 
|  | 43 | +  183598293 requests in 5760.00m, 62.80GB read | 
|  | 44 | +  Socket errors: connect 0, read 338604, write 82770, timeout 57938 | 
|  | 45 | +  Non-2xx or 3xx responses: 33893 | 
|  | 46 | +Requests/sec:    531.24 | 
|  | 47 | +Transfer/sec:    190.54KB | 
|  | 48 | +``` | 
|  | 49 | + | 
|  | 50 | +HTTPS: | 
|  | 51 | + | 
|  | 52 | +```text | 
|  | 53 | +Running 5760m test @ https://cafe.example.com/tea | 
|  | 54 | +  2 threads and 100 connections | 
|  | 55 | +  Thread Stats   Avg      Stdev     Max   +/- Stdev | 
|  | 56 | +    Latency   189.21ms  108.25ms   2.00s    66.82% | 
|  | 57 | +    Req/Sec   271.64    178.03     1.96k    63.33% | 
|  | 58 | +  182905321 requests in 5760.00m, 61.55GB read | 
|  | 59 | +  Socket errors: connect 10168, read 332301, write 0, timeout 96 | 
|  | 60 | +Requests/sec:    529.24 | 
|  | 61 | +Transfer/sec:    186.76KB | 
|  | 62 | +``` | 
|  | 63 | + | 
|  | 64 | +## Key Metrics | 
|  | 65 | + | 
|  | 66 | +### Containers memory | 
|  | 67 | + | 
|  | 68 | + | 
|  | 69 | + | 
|  | 70 | +### Containers CPU | 
|  | 71 | + | 
|  | 72 | + | 
|  | 73 | + | 
|  | 74 | +## Error Logs | 
|  | 75 | + | 
|  | 76 | +### nginx-gateway | 
|  | 77 | + | 
|  | 78 | +- msg: Config apply failed, rolling back config; error: error getting file data for name:"/etc/nginx/conf.d/http.conf"  hash:"Luqynx2dkxqzXH21wmiV0nj5bHyGiIq7/2gOoM6aKew="  permissions:"0644"  size:5430: rpc error: code = NotFound desc = file not found -> happened twice in the 4 days, related to agent reconciliation during token rotation | 
|  | 79 | +  - {hashFound: jmeyy1p+6W1icH2x2YGYffH1XtooWxvizqUVd+WdzQ4=, hashWanted: Luqynx2dkxqzXH21wmiV0nj5bHyGiIq7/2gOoM6aKew=, level: debug, logger: nginxUpdater.fileService, msg: File found had wrong hash, ts: 2025-10-18T18:11:24Z} | 
|  | 80 | +  - The error indicates Agent requested a file that had since changed | 
|  | 81 | + | 
|  | 82 | +- msg: Failed to update lock optimistically: the server was unable to return a response in the time allotted, but may still be processing the request (put leases.coordination.k8s.io ngf-longevity-nginx-gateway-fabric-leader-election), falling back to slow path -> same leader election error as on plus, seems out of scope of our product | 
|  | 83 | + | 
|  | 84 | +- msg: no matches for kind "InferencePool" in version "inference.networking.k8s.io/v1" -> Thousands of these, but fixed in PR 4104 | 
|  | 85 | + | 
|  | 86 | +### nginx | 
|  | 87 | + | 
|  | 88 | +Traffic: nearly 34000 502s | 
|  | 89 | + | 
|  | 90 | +- These all happened in the same window of less than a minute (approx 2025-10-18T18:11:11 - 2025-10-18T18:11:50), and resolved once NGINX restarted | 
|  | 91 | +- It's unclear what triggered NGINX to restart, though it does appear a memory spike was observed around this time | 
|  | 92 | +- The outage correlates with the config apply error seen in the control plane logs | 
0 commit comments