Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If volume is delinquent, switch owner of volume/engine/replica to share manager CR's owner (backport #3004) #3005

Merged
merged 1 commit into from
Jul 24, 2024

Conversation

mergify[bot]
Copy link

@mergify mergify bot commented Jul 24, 2024

Without the fix, volume/engine/replica continue to wait for the share manager pod to be scheduled (i.e., pod.Spec.NodeName is non empty) to set ownerID to the same pod's node. However, because we don't want to use pod's imformers, when the share manager pod is scheduled, volume/engine/controller might not catch that event and continue to wait. This introduce up to 30s delay and behavioral inconsistency

Also, the > 30s delay in share manager pod recreation is destroying the RWX fast failover's original goal

longhorn/longhorn#6205

Some testing results:

  • Before the fix, it was taking from 15s to 70s for the new share manager pod to become running after shutting down the node of the old share manager pod
  • After the fix, it is taking from 15s to 17s for the new share manager pod to become running after shutting down the node of the old share manager pod

This is an automatic backport of pull request #3004 done by [Mergify](https://mergify.com).

manager CR's owner

Without the fix, volume/engine/replica continue to wait for the
share manager pod to be scheduled (i.e., pod.Spec.NodeName is non
empty) to set ownerID to the same pod's node. However, because we
don't want to use pod's imformers, when the share manager pod is
scheduled, volume/engine/controller might not catch that event
and continue to wait. This introduce up to 30s delay and
behavioral inconsistency

Also the > 30s delay in share manager pod recreation is destroying
the RWX fast failover's original goal

longhorn-6205

Signed-off-by: Phan Le <phan.le@suse.com>
(cherry picked from commit 18ac5dc)
@PhanLe1010 PhanLe1010 merged commit 590fac1 into v1.7.x Jul 24, 2024
6 checks passed
@PhanLe1010 PhanLe1010 deleted the mergify/bp/v1.7.x/pr-3004 branch July 24, 2024 23:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant