Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(expansion): add an error message to guide users in resolving the stuck volume expansion cancellation #3168

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 11 additions & 6 deletions manager/volume.go
Original file line number Diff line number Diff line change
Expand Up @@ -555,12 +555,6 @@ func (m *VolumeManager) CancelExpansion(volumeName string) (v *longhorn.Volume,
if err != nil {
return nil, err
}
if !v.Status.ExpansionRequired {
return nil, fmt.Errorf("volume expansion is not started")
}
if v.Status.IsStandby {
return nil, fmt.Errorf("canceling expansion for standby volume is not supported")
}

var engine *longhorn.Engine
es, err := m.ds.ListVolumeEngines(v.Name)
Expand All @@ -574,6 +568,17 @@ func (m *VolumeManager) CancelExpansion(volumeName string) (v *longhorn.Volume,
engine = e
}

if !v.Status.ExpansionRequired {
if v.Spec.Size == engine.Spec.VolumeSize && v.Spec.Size != engine.Status.CurrentSize {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this possible?

The intermediate state that causes the bug is:

  1. The expansion is already started and finished in the engine process level, which means v.Spec.Size == engine.Spec.VolumeSize.
  2. Longhorn has not updated engine/volume CR status. In other words, v.Spec.Size != engine.Status.CurrentSize and v.Status.ExpansionRequired == true. (I am not sure if engine.Status.IsExpanding can be false here)

To be honest, it is hard to capture this state then reject the requests in API layer.

Copy link
Member Author

@derekbit derekbit Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you are right.

It is caused by it as I mentioned in longhorn/longhorn#9466 (comment).

Do you think the error message (hint how to resolve the issue) is reasonable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message itself is reasonable. But my question is, how can v.Status.ExpansionRequired become false (or can the new condition be satisfied) before VolumeManager.CancelExpansion execution complete?
The only place of adding the hint is probably here. And currently, Longhorn has no way to figure out if there is an cancellation before this error is trigger, the error hint can be only like:

the expected size %v of engine %v should not be smaller than the current size %v. It can be caused by a cancellation request after the expansion is complete, please re-do the expansion to resolve this error.

Copy link
Member

@innobead innobead Sep 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a general problem for canceling any async operations relying on the resource status which is regularly updated but not in real time.

Is it possible to get the runtime info from the engine directly when calling the cancellation to decide whether the volume is in expansion or not?

Copy link
Contributor

@shuo-wu shuo-wu Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get the runtime info from the engine directly when calling the cancellation to decide whether the volume is in expansion

It is not an atomic operation hence there can be a race anyway. For now, I think adding warnings/notifications is enough.

return nil, fmt.Errorf("volume has already been expanded to %v and cannot revert to size %v. Please manually expand it to %v to resolve the stuck cancellation",
engine.Status.CurrentSize, v.Spec.Size, engine.Status.CurrentSize)
}
return nil, fmt.Errorf("volume expansion is not started")
}
if v.Status.IsStandby {
return nil, fmt.Errorf("canceling expansion for standby volume is not supported")
}

if engine.Status.IsExpanding {
return nil, fmt.Errorf("the engine expansion is in progress")
}
Expand Down