-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Align api calls timeouts cronjob ip reconciler #480
Align api calls timeouts cronjob ip reconciler #480
Conversation
Pull Request Test Coverage Report for Build 9518734751Details
💛 - Coveralls |
Pull Request Test Coverage Report for Build 9544469259Details
💛 - Coveralls |
func (i *Client) ListPods() ([]v1.Pod, error) { | ||
logging.Debugf("listing Pods") | ||
|
||
ctxWithTimeout, cancel := context.WithTimeout(context.Background(), listRequestTimeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, I think I get it, you've got all the timeouts normalized on the listRequestTimeout and then we can eliminate the other timeouts in the reconciler. Nicely done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct. I will add a description in the commit. Thanks!
7de75c3
to
fc56b7a
Compare
Pull Request Test Coverage Report for Build 9582412868Details
💛 - Coveralls |
I tested with kind cluster and I still see leftover podRefs in ippools when scalingUp/down.
With Kind cluster, I don't see a lot of leftover podRefs for example for 100 scaleUp/down I saw one leftover podRefs, and doing scalingUp/Down again and again leftover podRefs keep on increasing by 1. But with more pods and nodes these leftover podRefs will increase. |
Thanks @adilGhaffarDev. What do you see in the logs? |
which are you interested in? here are one of the whereabouts pod logs:
|
The ip reconciler logs |
I'm looking for this |
I am not seeing this error in whereabouts DaemonSet pods. |
cool, that means we solved the original issue. Now, you're still having leftover ips because there is another issue. Not as many as before (because nothing was deleted before) but still, it shouldn't happen. I think it is due to this. 2024-06-19T12:47:13Z [debug] Started leader election I've seen it before. This is the pod controller, not the cron job. My suggestion is not overload this issue/pr and instead get it merged. Then, you can create a separate issue and we could investigate again. Please try to reproduce once more to verify that the original issue is not reproduce. I'll try to do it locally as well with the yaml definitions you provided. |
Pull Request Test Coverage Report for Build 9585671568Details
💛 - Coveralls |
I have tested with given PR fix, Normal AddedInterface 2m multus Add eth0 [192.168.250.94/32] from k8s-pod-network |
@pallavi-mandole, the CRD of IPPools changed. You need to update it. |
@mlguerrero12 I have done a round of local test with kind cluster where I had 8 IP ranges and 200 pods were running. I can confirm also that the overlappingIP error is not visible with this fix. But I was taking long time to get 200 pods in running state and also scale down to 1 was taking more than one hour to terminate all the pods. Also want to mention, after 199 pods are removed only 1 extra podreference can be seen in 3 IP ranges. So in my opinion this PR fix most of our issues. I will open new issue with that undeleted pod references. Here is the some results from the test:
In my opinion, you fix is solving with most of our issue only few pod references are still visible, which should be deleted. I will open another ticket to follow the issue. |
Parent timeout context of 30s was removed. All listing operations used by the cronjob reconciler has 30s as timeout. Fixes k8snetworkplumbingwg#389 Signed-off-by: Marcelo Guerrero <marguerr@redhat.com>
I've made updates to the CRD and thoroughly tested the fix, Observed the swiftly scaling of pods to 200. During testing, I didn't observe any issues with overlapping IPs. Error Log: |
@@ -108,28 +107,31 @@ func (i *Client) ListPods(ctx context.Context) ([]v1.Pod, error) { | |||
} | |||
|
|||
func (i *Client) GetPod(namespace, name string) (*v1.Pod, error) { | |||
pod, err := i.clientSet.CoreV1().Pods(namespace).Get(context.TODO(), name, metav1.GetOptions{}) | |||
ctxWithTimeout, cancel := context.WithTimeout(context.Background(), storage.RequestTimeout) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we also replace storage.RequestTimeout
with listRequestTimeout
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, this one is 10s for a single request
Signed-off-by: Marcelo Guerrero <marguerr@redhat.com>
fc56b7a
to
d394ff2
Compare
Pull Request Test Coverage Report for Build 9745523106Details
💛 - Coveralls |
merging based on test results from @adilGhaffarDev and @smoshiur1237. New issues will be handled in future PRs. |
Parent timeout context of 30s was removed. All listing operations used by the cronjob reconciler has 30s as timeout.
Fixes #389