Align api calls timeouts cronjob ip reconciler #480

mlguerrero12 · 2024-06-14T15:38:52Z

Parent timeout context of 30s was removed. All listing operations used by the cronjob reconciler has 30s as timeout.

Fixes #389

coveralls · 2024-06-14T15:45:07Z

Pull Request Test Coverage Report for Build 9518734751

Details

13 of 20 (65.0%) changed or added relevant lines in 3 files are covered.
2 unchanged lines in 1 file lost coverage.
Overall coverage increased (+0.3%) to 71.941%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
pkg/reconciler/ip.go	0	3	0.0%
pkg/reconciler/iploop.go	12	16	75.0%

Files with Coverage Reduction	New Missed Lines	%
pkg/reconciler/ip.go	2	0.0%

Totals
Change from base Build 9465694443:	0.3%
Covered Lines:	1123
Relevant Lines:	1561

💛 - Coveralls

coveralls · 2024-06-17T08:36:28Z

Pull Request Test Coverage Report for Build 9544469259

Details

13 of 20 (65.0%) changed or added relevant lines in 3 files are covered.
2 unchanged lines in 1 file lost coverage.
Overall coverage increased (+0.3%) to 71.941%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
pkg/reconciler/ip.go	0	3	0.0%
pkg/reconciler/iploop.go	12	16	75.0%

Files with Coverage Reduction	New Missed Lines	%
pkg/reconciler/ip.go	2	0.0%

Totals
Change from base Build 9465694443:	0.3%
Covered Lines:	1123
Relevant Lines:	1561

💛 - Coveralls

dougbtv · 2024-06-18T17:45:24Z

pkg/storage/kubernetes/client.go

+func (i *Client) ListPods() ([]v1.Pod, error) {
+	logging.Debugf("listing Pods")
+
+	ctxWithTimeout, cancel := context.WithTimeout(context.Background(), listRequestTimeout)


Cool, I think I get it, you've got all the timeouts normalized on the listRequestTimeout and then we can eliminate the other timeouts in the reconciler. Nicely done.

That's correct. I will add a description in the commit. Thanks!

coveralls · 2024-06-19T13:03:12Z

Pull Request Test Coverage Report for Build 9582412868

Details

13 of 20 (65.0%) changed or added relevant lines in 3 files are covered.
2 unchanged lines in 1 file lost coverage.
Overall coverage increased (+0.3%) to 71.941%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
pkg/reconciler/ip.go	0	3	0.0%
pkg/reconciler/iploop.go	12	16	75.0%

Files with Coverage Reduction	New Missed Lines	%
pkg/reconciler/ip.go	2	0.0%

Totals
Change from base Build 9565145210:	0.3%
Covered Lines:	1123
Relevant Lines:	1561

💛 - Coveralls

adilGhaffarDev · 2024-06-19T13:18:22Z

I tested with kind cluster and I still see leftover podRefs in ippools when scalingUp/down.
Reproduction steps:

run make kind
deploy following nads:

apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: range1
spec:
  config: '{
    "type": "macvlan",
    "cniVersion": "0.3.1",
    "name": "macvlan",
    "ipam": {
      "type": "whereabouts",
      "log_file" : "/var/log/whereabouts.log",
      "log_level" : "debug",
      "range": "2.2.2.0/24"
    }}'
---
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: range2
spec:
  config: '{
    "type": "macvlan",
    "cniVersion": "0.3.1",
    "name": "macvlan",
    "ipam": {
      "type": "whereabouts",
      "log_file" : "/var/log/whereabouts.log",
      "log_level" : "debug",
      "range": "3.3.3.0/24"
    }}'
---
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: range3
spec:
  config: '{
    "type": "macvlan",
    "cniVersion": "0.3.1",
    "name": "macvlan",
    "ipam": {
      "type": "whereabouts",
      "log_file" : "/var/log/whereabouts.log",
      "log_level" : "debug",
      "range": "4.4.4.0/24"
    }}'
---
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: range4
spec:
  config: '{
    "type": "macvlan",
    "cniVersion": "0.3.1",
    "name": "macvlan",
    "ipam": {
      "type": "whereabouts",
      "log_file" : "/var/log/whereabouts.log",
      "log_level" : "debug",
      "range": "5.5.5.0/24"
    }}'
---
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: range5
spec:
  config: '{
    "type": "macvlan",
    "cniVersion": "0.3.1",
    "name": "macvlan",
    "ipam": {
      "type": "whereabouts",
      "log_file" : "/var/log/whereabouts.log",
      "log_level" : "debug",
      "range": "6.6.6.0/24"
    }}'
---
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: range6
spec:
  config: '{
    "type": "macvlan",
    "cniVersion": "0.3.1",
    "name": "macvlan",
    "ipam": {
      "type": "whereabouts",
      "log_file" : "/var/log/whereabouts.log",
      "log_level" : "debug",
      "range": "7.7.7.0/24"
    }}'
---
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: range7
spec:
  config: '{
    "type": "macvlan",
    "cniVersion": "0.3.1",
    "name": "macvlan",
    "ipam": {
      "type": "whereabouts",
      "log_file" : "/var/log/whereabouts.log",
      "log_level" : "debug",
      "range": "8.8.8.0/24"
    }}'
---
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: range8
spec:
  config: '{
    "type": "macvlan",
    "cniVersion": "0.3.1",
    "name": "macvlan",
    "ipam": {
      "type": "whereabouts",
      "log_file" : "/var/log/whereabouts.log",
      "log_level" : "debug",
      "range": "9.9.9.0/24"
    }}'
---

Deploy any sample application that uses above nads like nginx:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: n-dep
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 1
  template:
    metadata:
      labels:
        app: nginx
      annotations:
        k8s.v1.cni.cncf.io/networks: range1, range2, range3, range4, range5, range6, range7, range8
    spec:
      containers:
      - name: nginx
        image: nginx:1.14.2
        ports:
        - containerPort: 80

Scale n-dep to 100
kubectl scale deployment --replicas=100 n-dep
wait for all pods to be in running. It will take some time.
Once all pods are running scale n-dep to 1
kubectl scale deployment --replicas=1 n-dep
wait for all terminating pods to be deleted.
Once only one pod for n-dep is left check podRefs in all ippools, you will see leftover podRefs in one or more of the ippools. You can check it with the following command:
while true; do echo "" && date && kubectl get ippools.whereabouts.cni.cncf.io -n kube-system 3.3.3.0-24 -o yaml | grep -c podref && echo "" && sleep 20; done
Change ippool (3.3.3.0-24) and check all the pools. You will find leftover podRefs.

With Kind cluster, I don't see a lot of leftover podRefs for example for 100 scaleUp/down I saw one leftover podRefs, and doing scalingUp/Down again and again leftover podRefs keep on increasing by 1. But with more pods and nodes these leftover podRefs will increase.

mlguerrero12 · 2024-06-19T13:35:34Z

Thanks @adilGhaffarDev. What do you see in the logs?

adilGhaffarDev · 2024-06-19T13:40:55Z

Thanks @adilGhaffarDev. What do you see in the logs?

which are you interested in? here are one of the whereabouts pod logs:

2024-06-19T12:46:35Z [verbose] the NAD's config: {{ "type": "macvlan", "cniVersion": "0.3.1", "name": "macvlan", "ipam": { "type": "whereabouts", "log_file" : "/var/log/whereabouts.log", "log_level" : "debug", "range": "9.9.9.0/24" }}}
2024-06-19T12:46:35Z [debug] Used defaults from parsed flat file config @ /host/etc/cni/net.d/whereabouts.d/whereabouts.conf
2024-06-19T12:46:35Z [verbose] pool range [9.9.9.0/24]
2024-06-19T12:46:35Z [verbose] result of garbage collecting pods: <nil>
2024-06-19T12:46:37Z [verbose] deleted pod [default/n-dep-5c9fcbb8bb-6gm4c]
2024-06-19T12:46:37Z [verbose] skipped net-attach-def for default network
2024-06-19T12:46:37Z [debug] pod's network status: {Name:default/range1 Interface:net1 IPs:[2.2.2.33] Mac:d6:49:ad:36:82:44 Mtu:0 Default:false DNS:{Nameservers:[] Domain: Search:[] Options:[]} DeviceInfo:<nil> Gateway:[]}
2024-06-19T12:46:37Z [verbose] the NAD's config: {{ "type": "macvlan", "cniVersion": "0.3.1", "name": "macvlan", "ipam": { "type": "whereabouts", "log_file" : "/var/log/whereabouts.log", "log_level" : "debug", "range": "2.2.2.0/24" }}}
2024-06-19T12:46:37Z [debug] Used defaults from parsed flat file config @ /host/etc/cni/net.d/whereabouts.d/whereabouts.conf
2024-06-19T12:46:37Z [verbose] pool range [2.2.2.0/24]
2024-06-19T12:46:37Z [debug] pod's network status: {Name:default/range2 Interface:net2 IPs:[3.3.3.86] Mac:0e:fe:0b:76:59:1f Mtu:0 Default:false DNS:{Nameservers:[] Domain: Search:[] Options:[]} DeviceInfo:<nil> Gateway:[]}
2024-06-19T12:46:37Z [verbose] the NAD's config: {{ "type": "macvlan", "cniVersion": "0.3.1", "name": "macvlan", "ipam": { "type": "whereabouts", "log_file" : "/var/log/whereabouts.log", "log_level" : "debug", "range": "3.3.3.0/24" }}}
2024-06-19T12:46:37Z [debug] Used defaults from parsed flat file config @ /host/etc/cni/net.d/whereabouts.d/whereabouts.conf
2024-06-19T12:46:37Z [verbose] pool range [3.3.3.0/24]
2024-06-19T12:46:37Z [debug] pod's network status: {Name:default/range3 Interface:net3 IPs:[4.4.4.87] Mac:32:c6:bf:21:be:b6 Mtu:0 Default:false DNS:{Nameservers:[] Domain: Search:[] Options:[]} DeviceInfo:<nil> Gateway:[]}
2024-06-19T12:46:37Z [verbose] the NAD's config: {{ "type": "macvlan", "cniVersion": "0.3.1", "name": "macvlan", "ipam": { "type": "whereabouts", "log_file" : "/var/log/whereabouts.log", "log_level" : "debug", "range": "4.4.4.0/24" }}}
2024-06-19T12:46:37Z [debug] Used defaults from parsed flat file config @ /host/etc/cni/net.d/whereabouts.d/whereabouts.conf
2024-06-19T12:46:37Z [verbose] pool range [4.4.4.0/24]
2024-06-19T12:46:37Z [debug] pod's network status: {Name:default/range4 Interface:net4 IPs:[5.5.5.86] Mac:1e:18:f9:db:d2:b2 Mtu:0 Default:false DNS:{Nameservers:[] Domain: Search:[] Options:[]} DeviceInfo:<nil> Gateway:[]}
2024-06-19T12:46:37Z [verbose] the NAD's config: {{ "type": "macvlan", "cniVersion": "0.3.1", "name": "macvlan", "ipam": { "type": "whereabouts", "log_file" : "/var/log/whereabouts.log", "log_level" : "debug", "range": "5.5.5.0/24" }}}
2024-06-19T12:46:37Z [debug] Used defaults from parsed flat file config @ /host/etc/cni/net.d/whereabouts.d/whereabouts.conf
2024-06-19T12:46:37Z [verbose] pool range [5.5.5.0/24]
2024-06-19T12:46:37Z [debug] pod's network status: {Name:default/range5 Interface:net5 IPs:[6.6.6.88] Mac:42:23:83:ae:3f:21 Mtu:0 Default:false DNS:{Nameservers:[] Domain: Search:[] Options:[]} DeviceInfo:<nil> Gateway:[]}
2024-06-19T12:46:37Z [verbose] the NAD's config: {{ "type": "macvlan", "cniVersion": "0.3.1", "name": "macvlan", "ipam": { "type": "whereabouts", "log_file" : "/var/log/whereabouts.log", "log_level" : "debug", "range": "6.6.6.0/24" }}}
2024-06-19T12:46:37Z [debug] Used defaults from parsed flat file config @ /host/etc/cni/net.d/whereabouts.d/whereabouts.conf
2024-06-19T12:46:37Z [verbose] pool range [6.6.6.0/24]
2024-06-19T12:46:37Z [debug] pod's network status: {Name:default/range6 Interface:net6 IPs:[7.7.7.86] Mac:ea:b8:30:4e:45:e3 Mtu:0 Default:false DNS:{Nameservers:[] Domain: Search:[] Options:[]} DeviceInfo:<nil> Gateway:[]}
2024-06-19T12:46:37Z [verbose] the NAD's config: {{ "type": "macvlan", "cniVersion": "0.3.1", "name": "macvlan", "ipam": { "type": "whereabouts", "log_file" : "/var/log/whereabouts.log", "log_level" : "debug", "range": "7.7.7.0/24" }}}
2024-06-19T12:46:37Z [debug] Used defaults from parsed flat file config @ /host/etc/cni/net.d/whereabouts.d/whereabouts.conf
2024-06-19T12:46:37Z [verbose] pool range [7.7.7.0/24]
2024-06-19T12:46:37Z [debug] pod's network status: {Name:default/range7 Interface:net7 IPs:[8.8.8.89] Mac:06:80:3d:ef:a9:e1 Mtu:0 Default:false DNS:{Nameservers:[] Domain: Search:[] Options:[]} DeviceInfo:<nil> Gateway:[]}
2024-06-19T12:46:37Z [verbose] the NAD's config: {{ "type": "macvlan", "cniVersion": "0.3.1", "name": "macvlan", "ipam": { "type": "whereabouts", "log_file" : "/var/log/whereabouts.log", "log_level" : "debug", "range": "8.8.8.0/24" }}}
2024-06-19T12:46:37Z [debug] Used defaults from parsed flat file config @ /host/etc/cni/net.d/whereabouts.d/whereabouts.conf
2024-06-19T12:46:37Z [verbose] pool range [8.8.8.0/24]
2024-06-19T12:46:37Z [debug] pod's network status: {Name:default/range8 Interface:net8 IPs:[9.9.9.90] Mac:2e:de:27:cf:9c:89 Mtu:0 Default:false DNS:{Nameservers:[] Domain: Search:[] Options:[]} DeviceInfo:<nil> Gateway:[]}
2024-06-19T12:46:37Z [verbose] the NAD's config: {{ "type": "macvlan", "cniVersion": "0.3.1", "name": "macvlan", "ipam": { "type": "whereabouts", "log_file" : "/var/log/whereabouts.log", "log_level" : "debug", "range": "9.9.9.0/24" }}}
2024-06-19T12:46:37Z [debug] Used defaults from parsed flat file config @ /host/etc/cni/net.d/whereabouts.d/whereabouts.conf
2024-06-19T12:46:37Z [verbose] pool range [9.9.9.0/24]
2024-06-19T12:46:37Z [verbose] result of garbage collecting pods: <nil>
2024-06-19T12:47:13Z [verbose] deleted pod [default/n-dep-5c9fcbb8bb-4ml22]
2024-06-19T12:47:13Z [verbose] skipped net-attach-def for default network
2024-06-19T12:47:13Z [debug] pod's network status: {Name:default/range1 Interface:net1 IPs:[2.2.2.8] Mac:ee:46:4e:dd:47:94 Mtu:0 Default:false DNS:{Nameservers:[] Domain: Search:[] Options:[]} DeviceInfo:<nil> Gateway:[]}
2024-06-19T12:47:13Z [verbose] the NAD's config: {{ "type": "macvlan", "cniVersion": "0.3.1", "name": "macvlan", "ipam": { "type": "whereabouts", "log_file" : "/var/log/whereabouts.log", "log_level" : "debug", "range": "2.2.2.0/24" }}}
2024-06-19T12:47:13Z [debug] Used defaults from parsed flat file config @ /host/etc/cni/net.d/whereabouts.d/whereabouts.conf
2024-06-19T12:47:13Z [verbose] pool range [2.2.2.0/24]
2024-06-19T12:47:13Z [debug] pod's network status: {Name:default/range2 Interface:net2 IPs:[3.3.3.13] Mac:66:c1:94:26:52:81 Mtu:0 Default:false DNS:{Nameservers:[] Domain: Search:[] Options:[]} DeviceInfo:<nil> Gateway:[]}
2024-06-19T12:47:13Z [verbose] the NAD's config: {{ "type": "macvlan", "cniVersion": "0.3.1", "name": "macvlan", "ipam": { "type": "whereabouts", "log_file" : "/var/log/whereabouts.log", "log_level" : "debug", "range": "3.3.3.0/24" }}}
2024-06-19T12:47:13Z [debug] Used defaults from parsed flat file config @ /host/etc/cni/net.d/whereabouts.d/whereabouts.conf
2024-06-19T12:47:13Z [verbose] pool range [3.3.3.0/24]
2024-06-19T12:47:13Z [debug] pod's network status: {Name:default/range3 Interface:net3 IPs:[4.4.4.16] Mac:82:28:a6:c1:bf:8e Mtu:0 Default:false DNS:{Nameservers:[] Domain: Search:[] Options:[]} DeviceInfo:<nil> Gateway:[]}
2024-06-19T12:47:13Z [verbose] the NAD's config: {{ "type": "macvlan", "cniVersion": "0.3.1", "name": "macvlan", "ipam": { "type": "whereabouts", "log_file" : "/var/log/whereabouts.log", "log_level" : "debug", "range": "4.4.4.0/24" }}}
2024-06-19T12:47:13Z [debug] Used defaults from parsed flat file config @ /host/etc/cni/net.d/whereabouts.d/whereabouts.conf
2024-06-19T12:47:13Z [verbose] pool range [4.4.4.0/24]
2024-06-19T12:47:13Z [verbose] stale allocation to cleanup: {ContainerID:3089d2bae3742c24c24ebef985defa4da1691429321397351f1c460769075ecc PodRef:default/n-dep-5c9fcbb8bb-4ml22 IfName:net3}
2024-06-19T12:47:13Z [debug] Started leader election
I0619 12:47:13.968604      30 leaderelection.go:250] attempting to acquire leader lease /whereabouts...
E0619 12:47:13.969289      30 leaderelection.go:332] error retrieving resource lock /whereabouts: an empty namespace may not be set when a resource name is provided
E0619 12:47:14.756068      30 leaderelection.go:332] error retrieving resource lock /whereabouts: an empty namespace may not be set when a resource name is provided

mlguerrero12 · 2024-06-19T13:44:42Z

The ip reconciler logs

mlguerrero12 · 2024-06-19T13:46:18Z

I'm looking for this
2023-10-27T11:55:37Z [error] failed to list all OverLappingIPs: client rate limiter Wait returned an error: context deadline exceeded
2023-10-27T11:55:37Z [error] failed to create the reconcile looper: failed to list all OverLappingIPs: client rate limiter Wait returned an error: context deadline exceeded

adilGhaffarDev · 2024-06-19T13:51:41Z

I'm looking for this
2023-10-27T11:55:37Z [error] failed to list all OverLappingIPs: client rate limiter Wait returned an error: context deadline exceeded
2023-10-27T11:55:37Z [error] failed to create the reconcile looper: failed to list all OverLappingIPs: client rate limiter Wait returned an error: context deadline exceeded

I am not seeing this error in whereabouts DaemonSet pods.

mlguerrero12 · 2024-06-19T14:02:56Z

cool, that means we solved the original issue. Now, you're still having leftover ips because there is another issue. Not as many as before (because nothing was deleted before) but still, it shouldn't happen.

I think it is due to this.

2024-06-19T12:47:13Z [debug] Started leader election
I0619 12:47:13.968604 30 leaderelection.go:250] attempting to acquire leader lease /whereabouts...
E0619 12:47:13.969289 30 leaderelection.go:332] error retrieving resource lock /whereabouts: an empty namespace may not be set when a resource name is provided
E0619 12:47:14.756068 30 leaderelection.go:332] error retrieving resource lock /whereabouts: an empty namespace may not be set when a resource name is provided

I've seen it before. This is the pod controller, not the cron job.

My suggestion is not overload this issue/pr and instead get it merged. Then, you can create a separate issue and we could investigate again.

Please try to reproduce once more to verify that the original issue is not reproduce. I'll try to do it locally as well with the yaml definitions you provided.

coveralls · 2024-06-19T17:02:00Z

Pull Request Test Coverage Report for Build 9585671568

Details

13 of 20 (65.0%) changed or added relevant lines in 3 files are covered.
2 unchanged lines in 1 file lost coverage.
Overall coverage increased (+0.3%) to 71.941%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
pkg/reconciler/ip.go	0	3	0.0%
pkg/reconciler/iploop.go	12	16	75.0%

Files with Coverage Reduction	New Missed Lines	%
pkg/reconciler/ip.go	2	0.0%

Totals
Change from base Build 9565145210:	0.3%
Covered Lines:	1123
Relevant Lines:	1561

💛 - Coveralls

pallavi-mandole · 2024-06-20T03:25:13Z

I have tested with given PR fix,
Facing this issue on my setup:

Normal AddedInterface 2m multus Add eth0 [192.168.250.94/32] from k8s-pod-network
Warning FailedCreatePodSandBox 119s (x16 over 2m14s) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "f39776f63de2c5d9f7a804e73ae778fa0c05e3d9e3d36f20d240936f4725b2ab": plugin type="multus" name="multus-cni-network" failed (add): [my-ns/my-pod-784b58cd8b-kzhr8/fcea4a65-38af-47d0-a98d-f7ef3fdf2fe5:macvlan]: error adding container to network "macvlan": error at storage engine: OverlappingRangeIPReservation.whereabouts.cni.cncf.io "200.2.2.1" is invalid: spec.containerid: Required value

mlguerrero12 · 2024-06-20T10:19:06Z

@pallavi-mandole, the CRD of IPPools changed. You need to update it.

smoshiur1237 · 2024-06-20T10:21:26Z

@mlguerrero12 I have done a round of local test with kind cluster where I had 8 IP ranges and 200 pods were running. I can confirm also that the overlappingIP error is not visible with this fix. But I was taking long time to get 200 pods in running state and also scale down to 1 was taking more than one hour to terminate all the pods. Also want to mention, after 199 pods are removed only 1 extra podreference can be seen in 3 IP ranges. So in my opinion this PR fix most of our issues. I will open new issue with that undeleted pod references. Here is the some results from the test:

--------------After 200 pods are up which took long time to get them in running state
2024-06-20T04:30:05Z [debug] pod reference default/nginx-deployment-75f8fd47f6-rml25 matches allocation; Allocation IP: 9.9.9.97; PodIPs: map[2.2.2.97:{} 3.3.3.97:{} 4.4.4.97:{} 5.5.5.97:{} 6.6.6.97:{} 7.7.7.97:{} 8.8.8.97:{} 9.9.9.97:{}]
2024-06-20T04:30:05Z [debug] pod reference default/nginx-deployment-75f8fd47f6-fx9g8 matches allocation; Allocation IP: 9.9.9.98; PodIPs: map[2.2.2.98:{} 3.3.3.98:{} 4.4.4.98:{} 5.5.5.98:{} 6.6.6.98:{} 7.7.7.98:{} 8.8.8.98:{} 9.9.9.98:{}]
2024-06-20T04:30:05Z [debug] pod reference default/nginx-deployment-75f8fd47f6-4m7f7 matches allocation; Allocation IP: 9.9.9.99; PodIPs: map[2.2.2.110:{} 3.3.3.124:{} 4.4.4.110:{} 5.5.5.101:{} 6.6.6.103:{} 7.7.7.102:{} 8.8.8.100:{} 9.9.9.99:{}]
2024-06-20T04:30:05Z [debug] no IP addresses to cleanup
2024-06-20T04:30:05Z [verbose] reconciler success

-----------Scale down to 1 pod took also lots of time and pods were in Terminating state for long time but no overlapping IP error from whereabouts pod.  When pods were in Terminating state, it was showing many undeleted pod references. 

while true; do echo "" && date && kubectl get ippools.whereabouts.cni.cncf.io -n kube-system  3.3.3.0-24 -o yaml | grep -c podref && echo "" && sleep 20; done

Thu Jun 20 06:53:34 UTC 2024
121

 while true; do echo "" && date && kubectl get ippools.whereabouts.cni.cncf.io -n kube-system  4.4.4.0-24 -o yaml | grep -c podref && echo "" && sleep 20; done

Thu Jun 20 06:53:55 UTC 2024
66

while true; do echo "" && date && kubectl get ippools.whereabouts.cni.cncf.io -n kube-system  5.5.5.0-24 -o yaml | grep -c podref && echo "" && sleep 20; done

Thu Jun 20 06:54:36 UTC 2024
11

while true; do echo "" && date && kubectl get ippools.whereabouts.cni.cncf.io -n kube-system  6.6.6.0-24 -o yaml | grep -c podref && echo "" && sleep 20; done

Thu Jun 20 06:54:56 UTC 2024
1

while true; do echo "" && date && kubectl get ippools.whereabouts.cni.cncf.io -n kube-system  7.7.7.0-24 -o yaml | grep -c podref && echo "" && sleep 20; done

Thu Jun 20 06:55:15 UTC 2024
1

while true; do echo "" && date && kubectl get ippools.whereabouts.cni.cncf.io -n kube-system  8.8.8.0-24 -o yaml | grep -c podref && echo "" && sleep 20; done

Thu Jun 20 06:55:35 UTC 2024
1

while true; do echo "" && date && kubectl get ippools.whereabouts.cni.cncf.io -n kube-system  9.9.9.0-24 -o yaml | grep -c podref && echo "" && sleep 20; done

Thu Jun 20 06:55:55 UTC 2024
1

while true; do echo "" && date && kubectl get ippools.whereabouts.cni.cncf.io -n kube-system  2.2.2.0-24 -o yaml | grep -c podref && echo "" && sleep 20; done

Thu Jun 20 06:56:08 UTC 2024
161

----------After the deployment came down to 1 and all other pods were deleted, undeleted podreferences count came down:
date && kubectl get ippools.whereabouts.cni.cncf.io -n kube-system  2.2.2.0-24 -o yaml | grep -c podref
Thu Jun 20 08:30:53 UTC 2024
2
date && kubectl get ippools.whereabouts.cni.cncf.io -n kube-system  3.3.3.0-24 -o yaml | grep -c podref
Thu Jun 20 08:32:41 UTC 2024
2
kubectl get ippools.whereabouts.cni.cncf.io -n kube-system  4.4.4.0-24 -o yaml | grep -c podref
2
kubectl get ippools.whereabouts.cni.cncf.io -n kube-system  5.5.5.0-24 -o yaml | grep -c podref
1
kubectl get ippools.whereabouts.cni.cncf.io -n kube-system  6.6.6.0-24 -o yaml | grep -c podref
1
kubectl get ippools.whereabouts.cni.cncf.io -n kube-system  7.7.7.0-24 -o yaml | grep -c podref
1
kubectl get ippools.whereabouts.cni.cncf.io -n kube-system  8.8.8.0-24 -o yaml | grep -c podref
1
kubectl get ippools.whereabouts.cni.cncf.io -n kube-system  9.9.9.0-24 -o yaml | grep -c podref
1


-------Whereabout pod in worker node is having the following error during deletion of the pods
2024-06-20T07:23:03Z [verbose] deleted pod [default/nginx-deployment-75f8fd47f6-5gfds]
E0620 07:23:03.933579      30 leaderelection.go:332] error retrieving resource lock /whereabouts: an empty namespace may not be set when a resource name is provided
E0620 07:23:04.682661      30 leaderelection.go:332] error retrieving resource lock /whereabouts: an empty namespace may not be set when a resource name is provided
E0620 07:23:05.317602      30 leaderelection.go:332] error retrieving resource lock /whereabouts: an empty namespace may not be set when a resource name is provided
E0620 07:23:06.074790      30 leaderelection.go:332] error retrieving resource lock /whereabouts: an empty namespace may not be set when a resource name is provided
E0620 07:23:06.802164      30 leaderelection.go:332] error retrieving resource lock /whereabouts: an empty namespace may not be set when a resource name is provided
E0620 07:23:07.559032      30 leaderelection.go:332] error retrieving resource lock /whereabouts: an empty namespace may not be set when a resource name is provided
E0620 07:23:08.119211      30 leaderelection.go:332] error retrieving resource lock /whereabouts: an empty namespace may not be set when a resource name is provided
E0620 07:23:08.632288      30 leaderelection.go:332] error retrieving resource lock /whereabouts: an empty namespace may not be set when a resource name is provided
E0620 07:23:09.465417      30 leaderelection.go:332] error retrieving resource lock /whereabouts: an empty namespace may not be set when a resource name is provided
E0620 07:23:10.190649      30 leaderelection.go:332] error retrieving resource lock /whereabouts: an empty namespace may not be set when a resource name is provided

In my opinion, you fix is solving with most of our issue only few pod references are still visible, which should be deleted. I will open another ticket to follow the issue.

Parent timeout context of 30s was removed. All listing operations used by the cronjob reconciler has 30s as timeout. Fixes k8snetworkplumbingwg#389 Signed-off-by: Marcelo Guerrero <marguerr@redhat.com>

pallavi-mandole · 2024-06-20T17:09:26Z

@pallavi-mandole, the CRD of IPPools changed. You need to update it.

I've made updates to the CRD and thoroughly tested the fix, Observed the swiftly scaling of pods to 200. During testing, I didn't observe any issues with overlapping IPs.
Later, I noticed a delay when scaling up to 500 pods. I encountered this below error while scaling down to 1 and then scaling back up to 500.

Error Log:
2024-06-20T16:55:16Z [error] failed to clean up IP for allocations: failed to update the reservation list: the server rejected our request due to an error in our request
2024-06-20T16:55:16Z [verbose] reconciler failure: failed to update the reservation list: the server rejected our request due to an error in our request

pliurh · 2024-06-24T07:15:44Z

pkg/storage/kubernetes/client.go

@@ -108,28 +107,31 @@ func (i *Client) ListPods(ctx context.Context) ([]v1.Pod, error) {
 }

 func (i *Client) GetPod(namespace, name string) (*v1.Pod, error) {
-	pod, err := i.clientSet.CoreV1().Pods(namespace).Get(context.TODO(), name, metav1.GetOptions{})
+	ctxWithTimeout, cancel := context.WithTimeout(context.Background(), storage.RequestTimeout)


shall we also replace storage.RequestTimeout with listRequestTimeout?

no, this one is 10s for a single request

Signed-off-by: Marcelo Guerrero <marguerr@redhat.com>

coveralls · 2024-07-01T13:58:48Z

Pull Request Test Coverage Report for Build 9745523106

Details

13 of 20 (65.0%) changed or added relevant lines in 3 files are covered.
2 unchanged lines in 1 file lost coverage.
Overall coverage increased (+0.3%) to 71.941%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
pkg/reconciler/ip.go	0	3	0.0%
pkg/reconciler/iploop.go	12	16	75.0%

Files with Coverage Reduction	New Missed Lines	%
pkg/reconciler/ip.go	2	0.0%

Totals
Change from base Build 9565145210:	0.3%
Covered Lines:	1123
Relevant Lines:	1561

💛 - Coveralls

mlguerrero12 · 2024-07-01T14:43:07Z

merging based on test results from @adilGhaffarDev and @smoshiur1237. New issues will be handled in future PRs.

mlguerrero12 requested review from maiqueb and dougbtv as code owners June 14, 2024 15:38

mlguerrero12 closed this Jun 17, 2024

mlguerrero12 reopened this Jun 17, 2024

smoshiur1237 mentioned this pull request Jun 18, 2024

Increase RequestTimout to fix overlappingIP context deadline error #478

Closed

dougbtv reviewed Jun 18, 2024

View reviewed changes

dougbtv approved these changes Jun 18, 2024

View reviewed changes

mlguerrero12 force-pushed the removeparenttimeout branch from 7de75c3 to fc56b7a Compare June 19, 2024 12:57

mlguerrero12 closed this Jun 19, 2024

mlguerrero12 reopened this Jun 19, 2024

Align api calls timeouts cronjob ip reconciler

61df92c

Parent timeout context of 30s was removed. All listing operations used by the cronjob reconciler has 30s as timeout. Fixes k8snetworkplumbingwg#389 Signed-off-by: Marcelo Guerrero <marguerr@redhat.com>

pliurh reviewed Jun 24, 2024

View reviewed changes

Fix race conditions reclaim ip stateful e2e test

d394ff2

Signed-off-by: Marcelo Guerrero <marguerr@redhat.com>

mlguerrero12 force-pushed the removeparenttimeout branch from fc56b7a to d394ff2 Compare July 1, 2024 13:53

mlguerrero12 merged commit c5e45aa into k8snetworkplumbingwg:master Jul 1, 2024
10 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Align api calls timeouts cronjob ip reconciler #480

Align api calls timeouts cronjob ip reconciler #480

mlguerrero12 commented Jun 14, 2024 •

edited

Loading

coveralls commented Jun 14, 2024 •

edited

Loading

coveralls commented Jun 17, 2024 •

edited

Loading

dougbtv Jun 18, 2024

mlguerrero12 Jun 19, 2024

coveralls commented Jun 19, 2024 •

edited

Loading

adilGhaffarDev commented Jun 19, 2024

mlguerrero12 commented Jun 19, 2024

adilGhaffarDev commented Jun 19, 2024

mlguerrero12 commented Jun 19, 2024

mlguerrero12 commented Jun 19, 2024

adilGhaffarDev commented Jun 19, 2024

mlguerrero12 commented Jun 19, 2024 •

edited

Loading

coveralls commented Jun 19, 2024 •

edited

Loading

pallavi-mandole commented Jun 20, 2024

mlguerrero12 commented Jun 20, 2024

smoshiur1237 commented Jun 20, 2024 •

edited

Loading

pallavi-mandole commented Jun 20, 2024

pliurh Jun 24, 2024

mlguerrero12 Jul 1, 2024

coveralls commented Jul 1, 2024 •

edited

Loading

mlguerrero12 commented Jul 1, 2024

Align api calls timeouts cronjob ip reconciler #480

Align api calls timeouts cronjob ip reconciler #480

Conversation

mlguerrero12 commented Jun 14, 2024 • edited Loading

coveralls commented Jun 14, 2024 • edited Loading

Pull Request Test Coverage Report for Build 9518734751

Details

💛 - Coveralls

coveralls commented Jun 17, 2024 • edited Loading

Pull Request Test Coverage Report for Build 9544469259

Details

💛 - Coveralls

dougbtv Jun 18, 2024

Choose a reason for hiding this comment

mlguerrero12 Jun 19, 2024

Choose a reason for hiding this comment

coveralls commented Jun 19, 2024 • edited Loading

Pull Request Test Coverage Report for Build 9582412868

Details

💛 - Coveralls

adilGhaffarDev commented Jun 19, 2024

mlguerrero12 commented Jun 19, 2024

adilGhaffarDev commented Jun 19, 2024

mlguerrero12 commented Jun 19, 2024

mlguerrero12 commented Jun 19, 2024

adilGhaffarDev commented Jun 19, 2024

mlguerrero12 commented Jun 19, 2024 • edited Loading

coveralls commented Jun 19, 2024 • edited Loading

Pull Request Test Coverage Report for Build 9585671568

Details

💛 - Coveralls

pallavi-mandole commented Jun 20, 2024

mlguerrero12 commented Jun 20, 2024

smoshiur1237 commented Jun 20, 2024 • edited Loading

pallavi-mandole commented Jun 20, 2024

pliurh Jun 24, 2024

Choose a reason for hiding this comment

mlguerrero12 Jul 1, 2024

Choose a reason for hiding this comment

coveralls commented Jul 1, 2024 • edited Loading

Pull Request Test Coverage Report for Build 9745523106

Details

💛 - Coveralls

mlguerrero12 commented Jul 1, 2024

mlguerrero12 commented Jun 14, 2024 •

edited

Loading

coveralls commented Jun 14, 2024 •

edited

Loading

coveralls commented Jun 17, 2024 •

edited

Loading

coveralls commented Jun 19, 2024 •

edited

Loading

mlguerrero12 commented Jun 19, 2024 •

edited

Loading

coveralls commented Jun 19, 2024 •

edited

Loading

smoshiur1237 commented Jun 20, 2024 •

edited

Loading

coveralls commented Jul 1, 2024 •

edited

Loading