Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(v2 upgrade): support engine live upgrade #241

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

derekbit
Copy link
Member

@derekbit derekbit commented Nov 17, 2024

Which issue(s) this PR fixes:

Issue longhorn/longhorn#9104

Signed-off-by: Derek Su derek.su@suse.com

What this PR does / why we need it:

Special notes for your reviewer:

Additional documentation or context

@derekbit derekbit self-assigned this Nov 17, 2024
Copy link

coderabbitai bot commented Nov 17, 2024

Walkthrough

The changes in this pull request involve the addition of a new field, StandbyTargetPort, to the Engine struct in two files: pkg/api/types.go and pkg/spdk/engine.go. The ProtoEngineToEngine function has been updated to handle this new field during conversion from protobuf. Additionally, the pkg/spdk/engine.go file includes a new method, isNewEngine, and significant restructuring of the Create method to improve connection management and error handling.

Changes

File Path Change Summary
pkg/api/types.go Added StandbyTargetPort int32 to Engine struct; updated ProtoEngineToEngine to handle new field.
pkg/spdk/engine.go Added StandbyTargetPort int32 to Engine struct; introduced isNewEngine method; restructured Create method and modified handleFrontend signature; improved error handling and logging in several methods.

Assessment against linked issues

Objective Addressed Explanation
Support live upgrade for control plane (#9104) The changes do not directly implement the requested feature.

Warning

Rate limit exceeded

@derekbit has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 0 minutes and 47 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between b15f126 and 5b4f847.


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (4)
pkg/api/types.go (1)

132-132: LGTM! Consider adding field documentation.

The new StandbyTargetPort field is well-positioned and follows the codebase conventions. Consider adding a comment to document its purpose in the context of live upgrades.

+	// StandbyTargetPort is used during live upgrades to maintain service availability
 	StandbyTargetPort int32                 `json:"standby_target_port"`
pkg/spdk/engine.go (3)

252-252: Possible misuse of variable in log message

At line 252, the log statement uses e.ReplicaModeMap to display the replicas being connected. However, e.ReplicaModeMap might not represent the list of replicas intended for logging.

Consider using replicaBdevList, which contains the list of replica block devices:

- e.log.Infof("Connecting all available replicas %+v, then launching raid during engine creation", e.ReplicaModeMap)
+ e.log.Infof("Connecting all available replicas %+v, then launching raid during engine creation", replicaBdevList)

This change will provide more accurate logging information about the replicas being connected.


685-688: Remove commented-out code to improve code clarity

The commented-out code at lines 685-688 is not used and can be removed to enhance readability and maintainability.

Apply this diff to remove the unused code:

- // podIP, err := commonnet.GetIPForPod()
- // if err != nil {
- //     return err
- // }

If this code is needed for future development, consider adding a comment explaining its purpose or moving it to a separate location.


Line range hint 1881-1881: Typographical error in function name closeRplicaClients

There is a typo in the function name closeRplicaClients. It should be closeReplicaClients for clarity and consistency.

Apply this diff to correct the function name:

- func (e *Engine) closeRplicaClients(replicaClients map[string]*client.SPDKClient) {
+ func (e *Engine) closeReplicaClients(replicaClients map[string]*client.SPDKClient) {

Make sure to update all references to this function accordingly.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between bf16180 and 668e4d2.

⛔ Files ignored due to path filters (2)
  • go.mod is excluded by !go.mod
  • vendor/modules.txt is excluded by !vendor/**
📒 Files selected for processing (2)
  • pkg/api/types.go (2 hunks)
  • pkg/spdk/engine.go (15 hunks)
🧰 Additional context used
🪛 golangci-lint
pkg/spdk/engine.go

315-315: expected declaration, found '<<'

(typecheck)

🔇 Additional comments (5)
pkg/api/types.go (2)

153-153: LGTM! Field mapping is correct.

The StandbyTargetPort field is properly mapped from the protobuf struct to the Engine type.


Line range hint 132-153: Verify protobuf definition exists.

Ensure that the standby_target_port field is properly defined in the SPDK service protobuf files.

pkg/spdk/engine.go (3)

45-45: New field StandbyTargetPort added to Engine struct

The addition of the StandbyTargetPort field to the Engine struct is appropriate. Ensure that this new field is properly initialized and consistently used throughout the codebase to avoid any unexpected behavior.


108-110: Function isNewEngine correctly determines if the engine is new

The isNewEngine function effectively checks if the IP field is empty to determine if the engine instance is new. This is a simple and efficient approach.


618-618: Field StandbyTargetPort included in getWithoutLock

The StandbyTargetPort field is correctly added to the Engine object returned by the getWithoutLock method. This ensures that the field will be included in any serialized representations of the engine's state.

Comment on lines 187 to 189
if e.StandbyTargetPort != 0 {
if e.StandbyTargetPort != 0 {
e.log.Warnf("Standby target instance with port %v is already created, will skip the target creation", e.StandbyTargetPort)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Redundant nested condition on e.StandbyTargetPort

The nested if statements both check e.StandbyTargetPort != 0, which is redundant and may indicate a logical error.

Apply this diff to correct the redundant condition:

if e.StandbyTargetPort != 0 {
-   if e.StandbyTargetPort != 0 {
      e.log.Warnf("Standby target instance with port %v is already created, will skip the target creation", e.StandbyTargetPort)
      return e.getWithoutLock(), nil
-   }
}

Ensure that the inner condition is either removed or modified to check a different variable if intended.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if e.StandbyTargetPort != 0 {
if e.StandbyTargetPort != 0 {
e.log.Warnf("Standby target instance with port %v is already created, will skip the target creation", e.StandbyTargetPort)
if e.StandbyTargetPort != 0 {
e.log.Warnf("Standby target instance with port %v is already created, will skip the target creation", e.StandbyTargetPort)

Longhorn 9104

Signed-off-by: Derek Su <derek.su@suse.com>
@derekbit derekbit changed the title chore(vendor): update dependencies feat(v2 upgrade): support engine live upgrade Nov 17, 2024
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (1)
pkg/spdk/engine.go (1)

Line range hint 2173-2224: Document target port management logic

The target switching logic correctly handles the port assignments, but would benefit from a comment explaining when and why StandbyTargetPort is reset to 0.

Add a comment before the condition:

+ // Reset StandbyTargetPort when switching to the pod's IP as the target,
+ // since this node is now the primary target
  if targetIP == podIP {
    e.TargetPort = targetPort
    e.StandbyTargetPort = 0
  }
🛑 Comments failed to post (1)
pkg/spdk/engine.go (1)

315-315: ⚠️ Potential issue

Critical: Resolve merge conflict

There is an unresolved merge conflict marker at line 315 (<<<<<<< HEAD). This needs to be resolved before the code can be merged.

Please resolve the merge conflict by:

  1. Running git merge or git rebase to update your branch
  2. Resolving the conflicts by choosing the appropriate code
  3. Removing all conflict markers (<<<<<<<, =======, >>>>>>>)
🧰 Tools
🪛 golangci-lint

315-315: expected declaration, found '<<'

(typecheck)

@derekbit derekbit force-pushed the v2-control-upgrade branch 2 times, most recently from b15f126 to 7f0d7e1 Compare November 17, 2024 09:16
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (1)
pkg/spdk/engine.go (1)

Line range hint 4-4: Consider adding tests for live upgrade scenarios

While the implementation looks solid, consider adding comprehensive test coverage for:

  • Standby target creation and cleanup
  • Target switchover scenarios
  • Error cases during live upgrades

Would you like me to help generate test cases for these scenarios?

🛑 Comments failed to post (1)
pkg/spdk/engine.go (1)

216-218: ⚠️ Potential issue

Add error handling for BdevRaidGet

The empty branch after BdevRaidGet error check could lead to silent failures. Consider adding appropriate error handling.

Apply this diff to add error handling:

if targetCreationRequired {
    _, err := spdkClient.BdevRaidGet(e.Name, 0)
    if err != nil {
+       if !jsonrpc.IsJSONRPCRespErrorNoSuchDevice(err) {
+           return nil, errors.Wrapf(err, "failed to get raid bdev %s", e.Name)
+       }
    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

	if targetCreationRequired {
		_, err := spdkClient.BdevRaidGet(e.Name, 0)
		if err != nil {
			if !jsonrpc.IsJSONRPCRespErrorNoSuchDevice(err) {
				return nil, errors.Wrapf(err, "failed to get raid bdev %s", e.Name)
			}
		}
🧰 Tools
🪛 golangci-lint

218-218: SA9003: empty branch

(staticcheck)

Longhorn 9104

Signed-off-by: Derek Su <derek.su@suse.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (3)
pkg/spdk/engine.go (3)

176-202: Consider simplifying the initialization logic

The current implementation has nested conditions that could be simplified for better readability and maintainability. Consider extracting the logic into separate helper functions.

Example refactor:

-	if podIP == initiatorIP && podIP == targetIP {
-		if e.Port == 0 && e.TargetPort == 0 {
-			e.log.Info("Creating both initiator and target instances")
-			initiatorCreationRequired = true
-			targetCreationRequired = true
-		} else if e.Port != 0 && e.TargetPort == 0 {
-			e.log.Info("Creating a target instance")
-			targetCreationRequired = true
-			if e.StandbyTargetPort != 0 {
-				e.log.Warnf("Standby target instance with port %v is already created, will skip the target creation", e.StandbyTargetPort)
-				return e.getWithoutLock(), nil
-			}
-		} else {
-			return nil, fmt.Errorf("invalid initiator and target address for engine %s creation", e.Name)
-		}
+   creationMode := determineCreationMode(podIP, initiatorIP, targetIP, e.Port, e.TargetPort)
+   switch creationMode {
+   case createBoth:
+       e.log.Info("Creating both initiator and target instances")
+       initiatorCreationRequired = true
+       targetCreationRequired = true
+   case createTargetOnly:
+       e.log.Info("Creating a target instance")
+       targetCreationRequired = true
+       if e.StandbyTargetPort != 0 {
+           e.log.Warnf("Standby target instance with port %v is already created, will skip the target creation", e.StandbyTargetPort)
+           return e.getWithoutLock(), nil
+       }
+   case createInitiatorOnly:
+       e.log.Info("Creating an initiator instance")
+       initiatorCreationRequired = true
+   default:
+       return nil, fmt.Errorf("invalid initiator and target address for engine %s creation", e.Name)
+   }

397-399: Add documentation for standby target creation condition

The condition for standby target creation could benefit from a comment explaining when and why it's needed.

+   // Create a standby target if we have an active initiator (e.Port != 0)
+   // but no active target (e.TargetPort == 0)
    standbyTargetCreationRequired := false
    if e.Port != 0 && e.TargetPort == 0 {
        standbyTargetCreationRequired = true
    }

2169-2173: Enhance error handling for pod IP operations

The pod IP retrieval and port updates are critical for live upgrades. Consider adding more detailed error messages and logging.

    podIP, err := commonnet.GetIPForPod()
    if err != nil {
-       return err
+       return errors.Wrapf(err, "failed to get pod IP for engine %s target switchover", e.Name)
    }

    if targetIP == podIP {
+       e.log.Infof("Target IP matches pod IP, updating ports: target=%d, standby=0", targetPort)
        e.TargetPort = targetPort
        e.StandbyTargetPort = 0
    }

Also applies to: 2216-2220

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between b15f126 and 5b4f847.

📒 Files selected for processing (2)
  • pkg/api/types.go (2 hunks)
  • pkg/spdk/engine.go (14 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/api/types.go
🔇 Additional comments (5)
pkg/spdk/engine.go (5)

45-45: LGTM: Added StandbyTargetPort field for standby target management

The new field is well-integrated with the existing port fields and supports the live upgrade functionality.


108-110: LGTM: Clear and focused helper function

The function provides a clean way to determine if an engine is new based on its IP state.


204-207: LGTM: Clear initialization of engine IPs

The initialization logic correctly sets the IP values for new engines using the helper function.


476-485: LGTM: Clear port assignment logic

The code correctly handles port assignment for both initiator and target instances, including the standby target scenario.


Line range hint 2169-2220: Verify concurrent access during target switchover

The target switchover process involves multiple steps and state changes. Ensure there are no race conditions when:

  1. Multiple goroutines attempt switchover simultaneously
  2. A switchover occurs during engine creation/deletion
  3. Pod IP changes during switchover
✅ Verification successful

Let me generate additional verification scripts to check for proper locking mechanisms and concurrent access patterns.


Based on the verification results, I can now generate the final response:

Proper locking mechanisms are in place during target switchover

The code correctly handles concurrent access during target switchover:

  • SwitchOverTarget method acquires the engine lock before state modifications
  • Related operations like Create, Delete, and DeleteTarget also use proper locking
  • State changes (TargetPort, StandbyTargetPort) are protected by the mutex

The original concerns about race conditions are addressed by the existing implementation.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for potential concurrent access patterns
ast-grep --pattern $'func \(.*\) SwitchOverTarget\(.*\) {
  $$$
  e.Lock()
  $$$
}'

# Check for other methods that might interact with target state
rg -A 5 "e\.(TargetPort|StandbyTargetPort)"

Length of output: 6650


Script:

#!/bin/bash
# Check for all methods that modify engine state and their locking patterns
ast-grep --pattern $'func \(e \*Engine\) $_($$) {
  $$$
  e.Lock()
  $$$
}'

# Check for methods that might be called during switchover
rg -A 5 "func.*Engine.*\b(Create|Delete|Switch|Connect|Disconnect|Reload)" pkg/spdk/

Length of output: 1794

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant