-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Share manager HA #2811
Share manager HA #2811
Conversation
caabf8e
to
b118921
Compare
8f059b7
to
0f16201
Compare
0f16201
to
ac962bf
Compare
Should we start reviewing or should we wait for the PR to be mark as |
701d9f5
to
3de89d4
Compare
This pull request is now in conflict. Could you fix it @james-munson? 🙏 |
Update: Currently, there are 2 remaining challenges:
|
We are making progress on challenge 1. Will update soon |
Update:The POC for challenge 1 at the comment is working now
Screencast.from.07-12-2024.12.26.30.PM.webmThe remaining challenge is:
|
FYI: I discussed with @james-munson to draw the state machine of the lease CR. This might help to speed up your review process |
48f8545
to
17aae5a
Compare
Rebased on current master and resolved conflicts. |
1d6b033
to
f16cb60
Compare
I would suggest consolidating the commits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will continue reviewing the share manager contoller soon.
This pull request is now in conflict. Could you fix it @james-munson? 🙏 |
So the flow is (I think):
So it looks like the pod and share manager updates will both trigger a volume reconcile simultaneously. (Everything Kubernetes actually does to the pod will be slower than this flow, since it normally doesn't try to do anything for a long time.) |
Either would work as an informer, so long as we just drive the ownership off of the pod's node. We don't want everybody changing to the interim owner of the SM only to change again once the pod is scheduled. |
Regarding to #2811 (comment) The next step in that flow would be a new SM pod is recreated by share-manager controller. Volume controller need to quickly detect the node of the new pod so it might need to watch the pod? Thought at this moment share-manager CR might have switch to starting state which can also trigger volume controller if we was using sharemanager CR informer instead. Let's me do a test |
This makes sense to me. My main concern is that this PR explicitly switches the volume controller to monitoring pods for requeues instead of share managers. But the share manager controller is already monitoring share manager pods and updating share manager state. (So far) I don't see the reason why the volume controller has to as well now, instead of just being triggered by share manager state as always. |
longhorn-6205 Signed-off-by: Phan Le <phan.le@suse.com>
Testing show that I reverted them. Any additional concern @ejweber @james-munson |
At this moment, I think the last item to get LGTM from @ejweber is the setting name change longhorn/longhorn#8804 (comment) @james-munson is helping with that |
Signed-off-by: James Munson <james.munson@suse.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM barring any necessary squashing/rebasing and the resolution to some of @derekbit's comments.
Thanks for the hard work on this!
Does the feature need to handle upgrade? Will a running RWX volume remain functional after upgrading the longhorn-system without any detachment? ref: #2811 (comment) @derekbit Yes, a running RWX volume will remain functional after upgrading the longhorn-system without any detachment |
Ref: #2811 (comment) Can you give more details on how disk monitor recover? From a quick reading, I don't quite to get it yet @derekbit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, LGTM.
Some minor issues and one TODO suggestion.
@james-munson Can we check if we need the swap or keep it as the pervious order?
if err = c.syncShareManagerPod(sm); err != nil {
return err
}
if err = c.syncShareManagerVolume(sm); err != nil {
return err
}
I just tried to swap the order and test it. The result is somehow volume detach/attach is much slower maybe due to have to make multiple sync loops. I think we can keep the current order. I will continue to investigate after the PR is merged. Added this to the TODO at longhorn/longhorn#6205 (comment) |
Maybe a new github ticket for this? @derekbit @james-munson |
longhorn-6205 Signed-off-by: Phan Le <phan.le@suse.com>
Let's track it in longhorn/longhorn#9062 |
My testing matches @PhanLe1010. With the sync order swapped, we time out trying to do mount operations in the volume sync, before ever dealing with the staleness of the pod, which I saw get up to more than 20 seconds overdue. With that, any time gain is lost, so I think the order needs to stay as it is. FWIW, in the e2e tests that I ran, it did not appear to affect normal RWO behavior. |
@mergify backport v1.7.x |
✅ Backports have been created
|
Which issue(s) this PR fixes:
This PR is part of longhorn/longhorn#6205 - Share manager HA. It creates the Lease, checks it, and takes action to delete the share-manager pod if it is stale and replace it with one on a different node.
What this PR does / why we need it:
Special notes for your reviewer:
Additional documentation or context
I am marking this PR as a draft because it is not complete. It needs to make the detachment and re-attachment to the new pod happen before the old pod's node goes notReady.