-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: do not panic when looking up BackupVolume of non-existing Volume #3519
Conversation
WalkthroughThe changes modify backup-related operations across several components. In the API and manager modules, the methods for retrieving and deleting backups no longer require a backup volume name, streamlining the method signatures. In contrast, the controller module renames and adapts methods to handle multiple backup volumes for snapshot restoration and cleanup. These modifications adjust control flows by iterating over multiple backup volumes and updating error handling without altering the underlying backup logic. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant API_Server
participant VolumeManager
Client->>API_Server: Invoke BackupGet/BackupDelete
API_Server->>VolumeManager: GetBackup(backupName)
VolumeManager-->>API_Server: Return backup details
alt Backup deletion requested
API_Server->>VolumeManager: DeleteBackup(backupName)
VolumeManager-->>API_Server: Confirmation of deletion
end
API_Server-->>Client: Send response
sequenceDiagram
participant CSI
participant ControllerServer
participant LonghornClient
CSI->>ControllerServer: Request snapshot creation (csiSnapshotTypeLonghornBackup)
ControllerServer->>LonghornClient: getBackupVolumes(sourceVolumeName)
alt Backup volumes available
loop For each backup volume
ControllerServer->>LonghornClient: Validate backup state
end
ControllerServer-->>CSI: Return snapshot restoration details
else No backup volumes found
ControllerServer-->>CSI: Return error ("no backup volume available")
end
Possibly related PRs
Suggested reviewers
Tip 🌐 Web search-backed reviews and chat
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
1eb4239
to
c1375e4
Compare
@PhanLe1010 It is still a draft. Ready for review? |
@derekbit I am waiting for @mantissahz to review the multiple backup logic first |
@PhanLe1010, Which part did you mean (CSI part?) or all multiple backup target logic? IMO, This PR is good enough to handle the volumes that do not exist. |
@@ -1287,24 +1296,22 @@ func (cs *ControllerServer) waitForBackupControllerSync(volumeName, snapshotName | |||
// (and in particular, its name) as quickly as possible and in any state. | |||
func (cs *ControllerServer) getBackup(volumeName, snapshotName string) (*longhornclient.Backup, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we discussed yesterday, this function will return nil, nil
if the backup is not found. I think if this behavior may be unexpected for some developers then leads to some NPE issues. Do we need to change this or at least add a comment for this behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. I added the comment.
Looks like we check for return backup to be different than nil
everywhere we call this fuction so I would prefer to keep the current logic. Ref:
longhorn-manager/csi/controller_server.go
Lines 811 to 832 in 469a446
// We check for backup existence first, since it's possible that the actual volume is no longer available but the // backup still is. backup, err := cs.getBackup(csiVolumeName, csiSnapshotName) if err != nil { // Status code set in waitForBackupControllerSync. return nil, err } if backup != nil { // Per the CSI spec, if we are unable to complete the CreateSnapshot call successfully, we must return a non-ok // gRPC code. In practice, doing so ensures we get requeued (and quickly deleted) when we hit // https://github.com/kubernetes-csi/external-snapshotter/issues/880. if backup.Error != "" { return nil, status.Error(codes.Internal, backup.Error) } snapshotID := encodeSnapshotID(csiSnapshotTypeLonghornBackup, backup.VolumeName, backup.Name) rsp := createSnapshotResponseForSnapshotTypeLonghornBackup(backup.VolumeName, snapshotID, backup.SnapshotCreated, backup.VolumeSize, backup.State == string(longhorn.BackupStateCompleted)) return rsp, nil } longhorn-manager/csi/controller_server.go
Lines 1247 to 1256 in 469a446
// Don't wait if we don't need to. backup, err := cs.getBackup(volumeName, snapshotName) if err != nil { return nil, err } if backup != nil && backup.SnapshotCreated != "" { // The backup controller sets the snapshot creation time at first sync. If we do not wait to return until // this is done, we may see timestamp related errors in csi-snapshotter logs. return backup, nil } longhorn-manager/csi/controller_server.go
Lines 1273 to 1280 in 469a446
backup, err := cs.getBackup(volumeName, snapshotName) if err != nil { return nil, err } if backup != nil && backup.SnapshotCreated != "" { return backup, nil } }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's improve this later, need to tackle it eventually to prevent similar issues.
cc @derekbit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
csi/controller_server.go (2)
1073-1094
: Single-iteration deletion logic might be confusing
The loop stops after the first non-empty backup volume name (line 1083~). This is valid because the BackupDelete API does not actually require multiple tries, but it could be confusing to new contributors. Consider simplifying by picking the first known backup volume and deleting it, rather than iterating.🧰 Tools
🪛 GitHub Check: CodeFactor
[notice] 1089-1089: csi/controller_server.go#L1089
If block ends with a return statement, so drop this else and outdent its block. (indent-error-flow)
1087-1091
: Remove unnecessary “else” block
Dropping theelse
block afterreturn err
will improve readability as recommended by the static analysis hint.if err != nil { return err -} else { - return nil -} +} +return nil🧰 Tools
🪛 GitHub Check: CodeFactor
[notice] 1089-1089: csi/controller_server.go#L1089
If block ends with a return statement, so drop this else and outdent its block. (indent-error-flow)
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
api/backup.go
(2 hunks)csi/controller_server.go
(5 hunks)manager/engine.go
(1 hunks)
🧰 Additional context used
📓 Learnings (1)
csi/controller_server.go (1)
Learnt from: mantissahz
PR: longhorn/longhorn-manager#2182
File: controller/backup_volume_controller.go:240-242
Timestamp: 2024-11-10T16:45:04.898Z
Learning: In `controller/backup_volume_controller.go`, the `BackupVolumeDelete` method of `backupTargetClient` returns `nil` if the backup volume is not found, so it's unnecessary to check for 'not found' errors after calling this method.
🪛 GitHub Check: CodeFactor
csi/controller_server.go
[notice] 1089-1089: csi/controller_server.go#L1089
If block ends with a return statement, so drop this else and outdent its block. (indent-error-flow)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: Build binaries
- GitHub Check: Summary
🔇 Additional comments (7)
csi/controller_server.go (3)
129-146
: Ensure early return for missing or invalid backup volumes
Here, the code retrieves backup volumes (bvs
) for the specified volume and continues to iterate if any are found. Make sure that returning an error whenbvs
is empty (line 134~) is the intended behavior, as this will block further progress if no backup volumes are located. This approach is consistent with the new multi-backup-volume design, but confirm that any caller expecting at least one backup volume can gracefully handle the error.
279-295
: Looks good: robust retrieval of multiple backup volumes
This new helper function cleanly encapsulates the logic for listing all backup volumes that match the specified volumeName. Logging a debug message if none are found (line 292~) provides clarity without interrupting flow. The implementation appears correct and aligns well with the multi-volume logic.
1299-1320
: Verify nil return behavior for nonexistent backups
This function returns(nil, nil)
when a backup doesn’t exist (line 1319~). Although documented, ensure all callers correctly handle the case where the backup is not found. If a “NotFound” error is more appropriate, consider returning such an error instead.api/backup.go (2)
320-331
: Check for potential nil backup
The code no longer requires the backup volume name to retrieve a backup (line 320~). Be sure that any logic depending on a specific volume name has been updated accordingly, since this call will return backups from any volume if found. Also confirm that the rest of the flow handles anil
backup gracefully (lines 324-327).
347-356
: Consistent deletion logic without volume parameter
Removing the volume name from theDeleteBackup
call (line 353~) simplifies the method signature. Verify that the manager’s internal logic no longer requires the volume context. If you have no further checks that rely on the volume name, this approach is more streamlined.manager/engine.go (2)
530-532
: Removed volume context in GetBackup
Line 530~ shows that theGetBackup
method now relies only on the backupName. This is consistent with the changes in the CSI and API layers. Confirm that calls to this method no longer need volume-specific filtering in other parts of the codebase.
534-536
: Removed volume parameter in DeleteBackup
Line 534~ drops the volumeName argument. This further unifies backup operations around the backupName alone. Double-check that old references are updated to avoid passing the removed volume argument.
I mean this PR @mantissahz Thank you for the reviewing. |
Updated the PR to addressed all feebacks @mantissahz @shuo-wu @derekbit Could you check again? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
csi/controller_server.go (2)
129-147
: LGTM! Consider enhancing error messages.The changes correctly handle multiple backup volumes and include proper error handling. The code iterates through backup volumes until it finds a valid backup, which aligns with the PR objectives.
Consider enhancing error messages to indicate when a backup exists but in a different backup volume:
- return nil, status.Errorf(codes.NotFound, "failed to restore CSI snapshot %v backup %s unavailable", snapshot.SnapshotId, backupName) + return nil, status.Errorf(codes.NotFound, "failed to restore CSI snapshot %v: backup %s not found in any of the backup volumes", snapshot.SnapshotId, backupName)
1073-1095
: LGTM! Consider simplifying the return logic.The changes correctly handle multiple backup volumes and include helpful comments about backward compatibility.
The return logic can be simplified by removing the nested return:
for _, bv := range backupVolumes { if bv.Name != "" { _, err = cs.apiClient.BackupVolume.ActionBackupDelete(bv, &longhornclient.BackupInput{ Name: backupName, }) - if err != nil { - return err - } - return nil + return err } }
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
api/backup.go
(2 hunks)csi/controller_server.go
(5 hunks)manager/engine.go
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- manager/engine.go
- api/backup.go
🧰 Additional context used
📓 Learnings (1)
csi/controller_server.go (1)
Learnt from: mantissahz
PR: longhorn/longhorn-manager#2182
File: controller/backup_volume_controller.go:240-242
Timestamp: 2024-11-10T16:45:04.898Z
Learning: In `controller/backup_volume_controller.go`, the `BackupVolumeDelete` method of `backupTargetClient` returns `nil` if the backup volume is not found, so it's unnecessary to check for 'not found' errors after calling this method.
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: Build binaries
- GitHub Check: Summary
🔇 Additional comments (2)
csi/controller_server.go (2)
279-294
: LGTM! Function name and implementation improvements.The changes improve clarity by:
- Renaming the function to better reflect its purpose
- Returning a slice of backup volumes
- Adding helpful debug logging
1297-1319
: LGTM! Clear documentation and proper error handling.The changes correctly handle multiple backup volumes and include clear documentation about the return behavior. The error handling is appropriate, and the early return optimization when finding a backup is efficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just one comment.
@@ -317,7 +317,7 @@ func (s *Server) BackupGet(w http.ResponseWriter, req *http.Request) error { | |||
} | |||
backupVolumeName := mux.Vars(req)["backupVolumeName"] | |||
|
|||
backup, err := s.m.GetBackup(input.Name, backupVolumeName) | |||
backup, err := s.m.GetBackup(input.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
backupVolumeName
is not required anymore. Let's remove it. (UI might need to change or just keep as it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @innobead It is still used for some warning log below to know which volume the backup belong to:
backup, err := s.m.GetBackup(input.Name)
if err != nil {
return errors.Wrapf(err, "failed to get backup '%v' of volume '%v'", input.Name, backupVolumeName)
}
if backup == nil {
logrus.Warnf("cannot find backup '%v' of volume '%v'", input.Name, backupVolumeName)
w.WriteHeader(http.StatusNotFound)
return nil
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to remove the bv name from these logs as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please note that backupVolumeName
is part of the API URL
longhorn-manager/api/router.go
Lines 147 to 148 in 4e93392
r.Methods("POST").Path("/v1/backupvolumes/{backupVolumeName}").Queries("action", name).Handler(f(schemas, action)) | |
} |
Now, since backupVolumeName is always available as part of the URL, I think it would be better to print out as more info as possible for easier debugging. I am not sure the benefit of not printing it out. WDYT @shuo-wu ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Considering the debugging purpose. Let's retain it.
1. One Volume can have multiple BackupVolumes, need to check all BackupVolumes when finding backups 2. It is totally valid to look up BackupVolumes of non-existing Volume. A deleted Volume can still have Backups in the cluster. Do not panic Signed-off-by: Phan Le <phan.le@suse.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
csi/controller_server.go (3)
1069-1091
: Simplify the backup cleanup logic.The current implementation can be simplified by removing the redundant return statements and early exit.
Apply this diff to improve the code:
func (cs *ControllerServer) cleanupBackup(sourceVolumeName, id string) error { backupVolumeName, backupName := sourceVolumeName, id backupVolumes, err := cs.getBackupVolumes(backupVolumeName) if err != nil { return err } for _, bv := range backupVolumes { // The BackupDelete API actually doesn't care about bv when deleting a backup. // The bv is there for backward compatibility. // Any bv will work if bv.Name != "" { _, err = cs.apiClient.BackupVolume.ActionBackupDelete(bv, &longhornclient.BackupInput{ Name: backupName, }) - if err != nil { - return err - } - return nil + return err } } return nil }
1295-1315
: Optimize backup retrieval logic.The current implementation returns early after finding a backup in the first matching backup volume, but continues searching through backup volumes if no backup is found. This is inconsistent with the behavior in
cleanupBackup
.Apply this diff to make the behavior consistent:
func (cs *ControllerServer) getBackup(volumeName, snapshotName string) (*longhornclient.Backup, error) { backupVolumes, err := cs.getBackupVolumes(volumeName) if err != nil { return nil, status.Error(codes.Internal, err.Error()) } - var backup *longhornclient.Backup for _, bv := range backupVolumes { backupListOutput, err := cs.apiClient.BackupVolume.ActionBackupList(bv) if err != nil { return nil, status.Error(codes.Internal, err.Error()) } for _, b := range backupListOutput.Data { if b.SnapshotName == snapshotName { - backup = &b - return backup, nil + return &b, nil } } } return nil, nil }
129-146
: Verify backup retrieval in CreateVolume.The backup retrieval logic in
CreateVolume
should be consistent with the behavior ingetBackup
. Currently, it continues searching through backup volumes even after finding a valid backup.Apply this diff to make the behavior consistent:
backupName := id bvs, err := cs.getBackupVolumes(sourceVolumeName) if err != nil { return nil, status.Errorf(codes.Internal, "failed to retrieve backup volumes of volume %v: %v", sourceVolumeName, err) } if len(bvs) == 0 { return nil, status.Errorf(codes.NotFound, "failed to restore CSI snapshot %v of volume %v: there is no backup volume", snapshot.SnapshotId, sourceVolumeName) } - var backup *longhornclient.Backup for _, bv := range bvs { backup, err = cs.apiClient.BackupVolume.ActionBackupGet(bv, &longhornclient.BackupInput{Name: backupName}) if err != nil { return nil, status.Errorf(codes.NotFound, "failed to restore CSI snapshot %v : failed to get backup %v: %v", snapshot.SnapshotId, backupName, err) } if backup != nil { - break + return &csi.CreateVolumeResponse{ + Volume: &csi.Volume{ + VolumeId: resVol.Id, + CapacityBytes: reqVolSizeBytes, + VolumeContext: volumeParameters, + ContentSource: volumeSource, + }, + }, nil } }
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
api/backup.go
(2 hunks)csi/controller_server.go
(5 hunks)manager/engine.go
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- manager/engine.go
- api/backup.go
🧰 Additional context used
📓 Learnings (1)
csi/controller_server.go (1)
Learnt from: mantissahz
PR: longhorn/longhorn-manager#2182
File: controller/backup_volume_controller.go:240-242
Timestamp: 2024-11-10T16:45:04.898Z
Learning: In `controller/backup_volume_controller.go`, the `BackupVolumeDelete` method of `backupTargetClient` returns `nil` if the backup volume is not found, so it's unnecessary to check for 'not found' errors after calling this method.
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: Build binaries
- GitHub Check: Summary
🔇 Additional comments (2)
csi/controller_server.go (2)
279-295
: LGTM: Method signature change aligns with multiple backup volumes support.The change from returning a single backup volume to a slice of backup volumes is a good improvement that aligns with the fact that a single volume can have multiple backup volumes.
1293-1294
: LGTM: Good documentation of return behavior.The comment clearly documents that the method returns nil for both backup and error when the backup doesn't exist, which is important for error handling in the calling code.
@@ -317,7 +317,7 @@ func (s *Server) BackupGet(w http.ResponseWriter, req *http.Request) error { | |||
} | |||
backupVolumeName := mux.Vars(req)["backupVolumeName"] | |||
|
|||
backup, err := s.m.GetBackup(input.Name, backupVolumeName) | |||
backup, err := s.m.GetBackup(input.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine to remove the bv name from these logs as well.
@mergify backport v1.8.x |
✅ Backports have been created
|
Which issue(s) this PR fixes:
Issue # longhorn/longhorn#10303
Special notes for your reviewer:
Putting this PR as draft. Need @mantissahz to help to review the multiple backup target logic