Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconcile ClusterCIDRs with Finalizers #37

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

mneverov
Copy link
Member

@mneverov mneverov commented Sep 26, 2024

Currently, when a finalizer is present a ClusterCIDR reconciliation (create/update) is skipped. Since users can create ClusterCIDRs with finalizers already in the ClusterCIDR definition it leads to problem described in #17.
This patch makes the following changes:

  • reconcile ClusterCIDRs even if finalizers are present
  • make createClusterCIDR idempotent by checking if a clusterCIDR is already present in the cidrMap.

Performance concerns:

  • for each ClusterCIDR modification the clusterCIDRList in the cidrmap will be traversed. Iterating through a slice even with a thousand elements (nodes) should not be a problem
  • for each modification the allocator will issue the ClusterCIDR update. Again, it should not be a problem since ClusterCIDRs are immutable and such updates should not happen (often). See also Replace Update with Patch #38.

Moving finalizer logic from createClusterCIDR requires changes in syncClusterCIDR signature since if it does not return an error cidrQueue forgets the event. I'll do it in a separate PR to keep this small.

Fixes #17

Run test cases in separate tests to improve visibility in test output.
…append ClusterCIDRs if they are already present in the cidrMap which makes createClusterCIDR idempotent.
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 26, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mneverov

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Sep 26, 2024
@mneverov mneverov changed the title [WIP] Reconcile ClusterCIDRs with Finalizers Sep 26, 2024
@mneverov mneverov marked this pull request as ready for review September 26, 2024 18:29
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Sep 26, 2024
@mneverov
Copy link
Member Author

mneverov commented Oct 7, 2024

@sarveshr7 @ameukam could you ptal?

@@ -1090,12 +1091,10 @@ func (r *multiCIDRRangeAllocator) reconcileCreate(ctx context.Context, clusterCI
defer r.lock.Unlock()

logger := klog.FromContext(ctx)
if needToAddFinalizer(clusterCIDR, clusterCIDRFinalizer) {
Copy link
Contributor

@aojea aojea Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the problem here is that we were doing the assumption that "finalizer" was the indicator to create or update.

Independently, before Create or Update, we need to check the new object we are building against the old object (this is missing), and then decide if we want to create or update. Regarding PATCH vs PUT, since these objects are only managed by this controller entirely, we can use PUT to move the object to the state we just built

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is since ClusterCIDR has all fields marked as immutable the only update we do is we set the finalizer.
We already check if the finalizer exists. Previously, we skipped the reconciliation and adding a ClusterCIDR to the cidrMap in this case that caused the bug.
With the current fix the finalizer is always added (if needed). We create a ClusterCIDR only when it does not have the ResourceVersion which seems to me a better indicator if the resource has already been processed and stored in etcd.
I understand that it is not the best approach and would maybe add a tuple to the queue: namespaced name and operation, such that we would know exact operation and would process the CIDR accordingly.
Can provide a PoC with the latter approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apologize because I only skimmed through the code and my comment can be inaccurate.

I think this is ok, just wanted to say that reconcilation seems to depend on more things than the finalizer, hence, we need to process the entire object.

I understand that it is not the best approach and would maybe add a tuple to the queue: namespaced name and operation, such that we would know exact operation and would process the CIDR accordingly.
Can provide a PoC with the latter approach.

that will make this an event based instead of level based controller, we don't want to do that.

What I try to mean, is that when you need to Update, you have the old and the new object, so you can compare them and decide if you want to update or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Issue with Dynamic Allocation of ClusterCIDR for New Nodes Without Component Restart
3 participants