-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: honor cacheready condition #20
Conversation
pkg/controller/ippool/controller.go
Outdated
return h.ippoolClient.UpdateStatus(ipPoolCpy) | ||
} | ||
if !networkv1.CacheReady.IsTrue(ipPool) { | ||
// if networkv1.CacheReady.GetStatus(ipPool) == "" || networkv1.CacheReady.IsFalse(ipPool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated, thanks!
12a8308
to
e948e02
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please continue to check the questions below:
(1) CacheReady is attached to CRD object networkv1.IPPool
; but in fact, the cacheReady should be related to a local POD, not a remote CRD object.
The POD can restart at any-time, the POD itself needs a big lock/state
to say, when it starts it's cache is not ready, it needs to sync unti the cache is build and ready, then all the controllers can continue to work based on it
The POD/Controller has an ipAllocator (as a shadow/view of remote IPPool CRD object, to simply the code process of allocating an IP), which is inited when the POD starts and works in the life-cycle of this POD. The IPPool controller
and agent controller
share it.
(2) Look below 3 functions, when a POD restarts, the networkv1.CacheReady
is true because last time it was ready; the agent/ippool/ippool.go
can still work before // ippool/controller.go OnChange
set the CacheReady
to false when POD starts.
// ippool/controller.go
// set !CacheReady
func (h *Handler) OnChange(key string, ipPool *IPPool) (*networkv1.IPPool, error) {
if !h.ipAllocator.IsNetworkInitialized(ipPool.Spec.NetworkName) {
networkv1.CacheReady.False(ipPoolCpy)
networkv1.CacheReady.Reason(ipPoolCpy, "NotInitialized")
networkv1.CacheReady.Message(ipPoolCpy, "")
// set CacheReady
func (h *Handler) BuildCache(ipPool *networkv1.IPPool, status networkv1.IPPoolStatus) (networkv1.IPPoolStatus, error) {
logrus.Debugf("(ippool.BuildCache) build ipam for ippool %s/%s", ipPool.Namespace, ipPool.Name)
// agent/ippool/ippool.go
func (c *Controller) Update(ipPool *networkv1.IPPool) error {
if !networkv1.CacheReady.IsTrue(ipPool) {
pkg/agent/ippool/ippool.go
Outdated
"github.com/sirupsen/logrus" | ||
|
||
networkv1 "github.com/harvester/vm-dhcp-controller/pkg/apis/network.harvesterhci.io/v1alpha1" | ||
"github.com/harvester/vm-dhcp-controller/pkg/util" | ||
) | ||
|
||
func (c *Controller) Update(ipPool *networkv1.IPPool) error { | ||
if !networkv1.CacheReady.IsTrue(ipPool) { | ||
logrus.Warning("ippool is not ready") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add ippool namespace and name into log, others are similar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated. PTAL, thanks
Please check harvester/harvester#4960 (comment) The |
5db5060
to
2757e42
Compare
Per your questions, (1) Agents only sync with their IPPool objects; for example, the (2) In fact, I'm considering dropping the CacheReady condition check for the agent. As mentioned above, agents don't rely on the controller's internal caches, so checking that condition before syncing with the IPPool object makes little sense. If there are any further questions, please let me know. Thank you. |
2757e42
to
d14ff38
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The controller model is event-driven, it runs when each OnChange, OnRemove happens,
The retry.RetryOnConflict(retry.DefaultBackoff...
will hold on a specific event's callback OnChage, as one OnChange plans to update two objects : vmnetworkconfig
and ippool
.
It is not the best solution, let's improve it later.
@@ -136,13 +146,7 @@ func (h *Handler) Allocate(vmNetCfg *networkv1.VirtualMachineNetworkConfig, stat | |||
) | |||
|
|||
// Update IPPool status | |||
ipPoolNamespace, ipPoolName := kv.RSplit(nc.NetworkName, "/") | |||
if err := retry.RetryOnConflict(retry.DefaultBackoff, func() error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When conflict in RetryOnConflic
, should re-get the ipPool
from cache like above? current var ipPool may have been out-dated
ipPool, err := h.ippoolCache.Get(ipPoolNamespace, ipPoolName)
The original code is re-getting ipPool
, what's the consideration to remove the code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're right. It should be inside the RetryOnConflict
block. But I just realized we no longer need the explicit retry, as it was stale when internal caches were introduced.
d14ff38
to
6380f8f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Both ippool & vmnetcfg reconcile loops will honor ippool's cacheready condition, not depending on the internal cache testing. The agent will check the condition of the ippool object it syncs with, too Signed-off-by: Zespre Chang <zespre.chang@suse.com>
Signed-off-by: Zespre Chang <zespre.chang@suse.com> Co-authored-by: Jack Yu <jack.yu@suse.com>
2057fe0
to
3b33854
Compare
Merge conflict resolved. |
IMPORTANT: Please do not create a Pull Request without creating an issue first.
Problem:
Solution:
Both ippool & vmnetcfg reconcile loops will honor ippool's cacheready condition, not depending on the internal cache testing. The agent will check the condition of the ippool object it syncs with, too
Related Issue:
harvester/harvester#5072
Test plan: