-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bug: if the snapshot is no longer in engine CR, don't block the removal process #2074
Conversation
removal process Longhorn-6298 Signed-off-by: Phan Le <phan.le@suse.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't realize you were already on a PR for this, so I drafted #2075. It is similar in concept (allow us to proceed to remove the finalizer if the snapshot is not in the engine). However, I was hoping to just reorder things without adding additional checks.
I'll defer to you since you know this section of code best. LGTM, but need to test it.
@ejweber I think we still need to check with the engine process before removing the snapshot CR's finalizer. This PR will do that. What do you think? |
We discussed my "competing" PR offline and are in agreement that this is the right approach. We don't have a simple recreate, but I can use the same iterative test I opened the issue with to check whether my cluster continues to accrue snapshots with this fix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After substantial discussion and explanation, this makes sense to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
nit: is it possible to have an orphaned snapshot always unable to delete because it can't be deleted at a replica side somehow, so the volume keeps in an auto-attaching state (surely, ticket type snapshot should be interruptable, so it should not impact other operations ideally)?
@mergify backport v1.5.x |
✅ Backports have been created
|
Also, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tested as suggested in #2074 (comment). After 15 iterations my cluster has only 50 snapshots, and each snapshot is only less than five minutes old. This is as expected.
If the snapshot is stuck in the removed state in the replica, yes, the volume will remain attached due to the snapshot-controller attachment ticket. If workload starts on different node, it will interrupt the snapshot AD ticket. For other operations that require attachment, they will request the same node as the snapshot AD ticket. So I think it is fine |
Sorry, could you elaborate more on this? I think this issue should happen in 1.5.x only because of the new AD mechanism |
Sounds good. |
It has been clarified. All good. |
longhorn/longhorn#6298