fix: honor cacheready condition #20

starbops · 2024-02-02T16:41:55Z

IMPORTANT: Please do not create a Pull Request without creating an issue first.

Problem:

Solution:

Both ippool & vmnetcfg reconcile loops will honor ippool's cacheready condition, not depending on the internal cache testing. The agent will check the condition of the ippool object it syncs with, too

Related Issue:

harvester/harvester#5072

Test plan:

Yu-Jack · 2024-02-05T05:42:28Z

pkg/controller/ippool/controller.go

-			return h.ippoolClient.UpdateStatus(ipPoolCpy)
-		}
+	if !networkv1.CacheReady.IsTrue(ipPool) {
+		// if networkv1.CacheReady.GetStatus(ipPool) == "" || networkv1.CacheReady.IsFalse(ipPool) {


need to remove it?

updated, thanks!

w13915984028

Please continue to check the questions below:

(1) CacheReady is attached to CRD object networkv1.IPPool; but in fact, the cacheReady should be related to a local POD, not a remote CRD object.

The POD can restart at any-time, the POD itself needs a big lock/state to say, when it starts it's cache is not ready, it needs to sync unti the cache is build and ready, then all the controllers can continue to work based on it

The POD/Controller has an ipAllocator (as a shadow/view of remote IPPool CRD object, to simply the code process of allocating an IP), which is inited when the POD starts and works in the life-cycle of this POD. The IPPool controller and agent controller share it.

(2) Look below 3 functions, when a POD restarts, the networkv1.CacheReady is true because last time it was ready; the agent/ippool/ippool.go can still work before // ippool/controller.go OnChange set the CacheReady to false when POD starts.

// ippool/controller.go
// set !CacheReady
func (h *Handler) OnChange(key string, ipPool *IPPool) (*networkv1.IPPool, error) {
	if !h.ipAllocator.IsNetworkInitialized(ipPool.Spec.NetworkName) {
		networkv1.CacheReady.False(ipPoolCpy)
		networkv1.CacheReady.Reason(ipPoolCpy, "NotInitialized")
		networkv1.CacheReady.Message(ipPoolCpy, "")
		
// set CacheReady
func (h *Handler) BuildCache(ipPool *networkv1.IPPool, status networkv1.IPPoolStatus) (networkv1.IPPoolStatus, error) {
	logrus.Debugf("(ippool.BuildCache) build ipam for ippool %s/%s", ipPool.Namespace, ipPool.Name)

// agent/ippool/ippool.go 
func (c *Controller) Update(ipPool *networkv1.IPPool) error {
	if !networkv1.CacheReady.IsTrue(ipPool) {

w13915984028 · 2024-02-05T08:03:15Z

pkg/agent/ippool/ippool.go

 	"github.com/sirupsen/logrus"

 	networkv1 "github.com/harvester/vm-dhcp-controller/pkg/apis/network.harvesterhci.io/v1alpha1"
 	"github.com/harvester/vm-dhcp-controller/pkg/util"
 )

 func (c *Controller) Update(ipPool *networkv1.IPPool) error {
+	if !networkv1.CacheReady.IsTrue(ipPool) {
+		logrus.Warning("ippool is not ready")


please add ippool namespace and name into log, others are similar

Updated. PTAL, thanks

w13915984028 · 2024-02-05T13:08:31Z

Please check harvester/harvester#4960 (comment)

The vm-dhcp-client has potential to be simplified to work only upon the existing VMNetworkConfig data, no necessary to build IPPool local cache, as the VMNetworkConfig is produced in control-plane.

starbops · 2024-02-05T17:33:05Z

Per your questions,

(1) Agents only sync with their IPPool objects; for example, the default-test-net-agent Pod syncs with the default/test-net IPPool. Agents don't rely on the controller's internal caches; they only honor the IPPool objects. It's one of the benefits of the control and data plane separation: if the controller dies, agents could still provide the service. The controller is responsible for maintaining the IPPool objects with the aid of the two internal caches. If the CacheReady condition is false, the controller cannot further reconcile the IPPool object. That means if new vmnetcfg objects are created, no IP addresses will be allocated for them. However, the agent can still serve their clients. It can rebuild the lease store from the ground up (based on the IPPool object) even if the agent Pod gets killed and respawned.

(2) In fact, I'm considering dropping the CacheReady condition check for the agent. As mentioned above, agents don't rely on the controller's internal caches, so checking that condition before syncing with the IPPool object makes little sense.

If there are any further questions, please let me know. Thank you.

w13915984028

The controller model is event-driven, it runs when each OnChange, OnRemove happens,

The retry.RetryOnConflict(retry.DefaultBackoff... will hold on a specific event's callback OnChage, as one OnChange plans to update two objects : vmnetworkconfig and ippool .

It is not the best solution, let's improve it later.

w13915984028 · 2024-02-07T12:19:52Z

pkg/controller/vmnetcfg/controller.go

@@ -136,13 +146,7 @@ func (h *Handler) Allocate(vmNetCfg *networkv1.VirtualMachineNetworkConfig, stat
 			)

 			// Update IPPool status
-			ipPoolNamespace, ipPoolName := kv.RSplit(nc.NetworkName, "/")
 			if err := retry.RetryOnConflict(retry.DefaultBackoff, func() error {


When conflict in RetryOnConflic, should re-get the ipPool from cache like above? current var ipPool may have been out-dated

ipPool, err := h.ippoolCache.Get(ipPoolNamespace, ipPoolName)

The original code is re-getting ipPool, what's the consideration to remove the code?

Yes, you're right. It should be inside the RetryOnConflict block. But I just realized we no longer need the explicit retry, as it was stale when internal caches were introduced.

w13915984028

LGTM, thanks.

pkg/controller/vmnetcfg/controller.go

Yu-Jack

LGTM

Both ippool & vmnetcfg reconcile loops will honor ippool's cacheready condition, not depending on the internal cache testing. The agent will check the condition of the ippool object it syncs with, too Signed-off-by: Zespre Chang <zespre.chang@suse.com>

Signed-off-by: Zespre Chang <zespre.chang@suse.com> Co-authored-by: Jack Yu <jack.yu@suse.com>

starbops · 2024-02-16T03:57:03Z

Merge conflict resolved.

starbops marked this pull request as ready for review February 2, 2024 17:05

starbops requested review from bk201, w13915984028 and Yu-Jack February 2, 2024 17:05

Yu-Jack reviewed Feb 5, 2024

View reviewed changes

starbops force-pushed the status-control branch from 12a8308 to e948e02 Compare February 5, 2024 07:34

w13915984028 reviewed Feb 5, 2024

View reviewed changes

starbops force-pushed the status-control branch 2 times, most recently from 5db5060 to 2757e42 Compare February 5, 2024 17:12

starbops force-pushed the status-control branch from 2757e42 to d14ff38 Compare February 7, 2024 11:55

w13915984028 reviewed Feb 7, 2024

View reviewed changes

starbops force-pushed the status-control branch from d14ff38 to 6380f8f Compare February 7, 2024 17:47

w13915984028 approved these changes Feb 8, 2024

View reviewed changes

Yu-Jack reviewed Feb 15, 2024

View reviewed changes

pkg/controller/vmnetcfg/controller.go Outdated Show resolved Hide resolved

starbops requested a review from Yu-Jack February 16, 2024 02:49

Yu-Jack approved these changes Feb 16, 2024

View reviewed changes

starbops and others added 2 commits February 16, 2024 11:41

fix: honor cacheready condition

194ff4b

Both ippool & vmnetcfg reconcile loops will honor ippool's cacheready condition, not depending on the internal cache testing. The agent will check the condition of the ippool object it syncs with, too Signed-off-by: Zespre Chang <zespre.chang@suse.com>

refactor: improve vmnetcfg allocate reconcile loop

3b33854

Signed-off-by: Zespre Chang <zespre.chang@suse.com> Co-authored-by: Jack Yu <jack.yu@suse.com>

starbops force-pushed the status-control branch from 2057fe0 to 3b33854 Compare February 16, 2024 03:48

starbops merged commit a8ea464 into harvester:main Feb 16, 2024
5 checks passed

starbops mentioned this pull request Feb 19, 2024

[BUG] Potential bugs on vm-dhcp-controller harvester/harvester#5072

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: honor cacheready condition #20

fix: honor cacheready condition #20

starbops commented Feb 2, 2024 •

edited

Loading

Yu-Jack Feb 5, 2024

starbops Feb 5, 2024

w13915984028 left a comment

w13915984028 Feb 5, 2024

starbops Feb 5, 2024

w13915984028 commented Feb 5, 2024

starbops commented Feb 5, 2024

w13915984028 left a comment •

edited

Loading

w13915984028 Feb 7, 2024

starbops Feb 7, 2024

w13915984028 left a comment

Yu-Jack left a comment

starbops commented Feb 16, 2024

fix: honor cacheready condition #20

fix: honor cacheready condition #20

Conversation

starbops commented Feb 2, 2024 • edited Loading

Yu-Jack Feb 5, 2024

Choose a reason for hiding this comment

starbops Feb 5, 2024

Choose a reason for hiding this comment

w13915984028 left a comment

Choose a reason for hiding this comment

w13915984028 Feb 5, 2024

Choose a reason for hiding this comment

starbops Feb 5, 2024

Choose a reason for hiding this comment

w13915984028 commented Feb 5, 2024

starbops commented Feb 5, 2024

w13915984028 left a comment • edited Loading

Choose a reason for hiding this comment

w13915984028 Feb 7, 2024

Choose a reason for hiding this comment

starbops Feb 7, 2024

Choose a reason for hiding this comment

w13915984028 left a comment

Choose a reason for hiding this comment

Yu-Jack left a comment

Choose a reason for hiding this comment

starbops commented Feb 16, 2024

starbops commented Feb 2, 2024 •

edited

Loading

w13915984028 left a comment •

edited

Loading