Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(resize): add rpc method to resize filesystem #347

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

james-munson
Copy link
Contributor

@james-munson james-munson commented Nov 7, 2024

Which issue(s) this PR fixes:

Issue longhorn/longhorn#9736 aka longhorn/longhorn#8118

What this PR does / why we need it:

Special notes for your reviewer:

Additional documentation or context

Copy link

coderabbitai bot commented Nov 7, 2024

Walkthrough

The changes in this pull request introduce new methods for resizing filesystems in the ShareManagerClient and ShareManagerServer components, specifically the FilesystemResize method. Additionally, a new function, ResizeEncryptoDevice, is added to handle resizing of encrypted volumes. The modifications also enhance error handling and logging throughout the relevant components, while maintaining the overall structure and existing functionalities.

Changes

File Change Summary
pkg/client/share_manager_client.go Added method FilesystemResize() error to ShareManagerClient. Moved import statement for rpc package.
pkg/crypto/crypto.go Added function ResizeEncryptoDevice(volume, passphrase string) error for resizing encrypted volumes.
pkg/rpc/server.go Added method FilesystemResize(ctx context.Context, req *emptypb.Empty) (resp *emptypb.Empty, err error) to ShareManagerServer. Modified FilesystemTrim for better error handling. Updated import statements to include crypto package.
pkg/server/share_manager.go Updated logic in setupDevice method for encrypted volumes. Added methods: GetVolume, SetShareExported, ShareIsExported, and Shutdown.

Assessment against linked issues

Objective Addressed Explanation
Implement filesystem resizing for RWX PVCs (issue #9736)
Ensure proper error handling during filesystem resizing
Enable resizing of encrypted volumes
Maintain existing functionality in related components

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@james-munson james-munson marked this pull request as draft November 7, 2024 22:45
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (5)
pkg/client/share_manager_client.go (1)

51-57: Add documentation for the FilesystemResize method

Please add godoc comments explaining:

  • The purpose of this method
  • Any pre-conditions that must be met
  • Possible errors that might be returned

Example:

+// FilesystemResize triggers a filesystem resize operation on the share manager.
+// It assumes the underlying volume has already been expanded.
+// Returns an error if the resize operation fails or times out.
 func (c *ShareManagerClient) FilesystemResize() error {
pkg/crypto/crypto.go (1)

66-82: Add input validation and improve error messages.

The function implementation looks solid but could benefit from some improvements:

  1. Add validation for empty volume name and passphrase
  2. Make error messages more descriptive for debugging
  3. Add logging for audit purposes

Here's a suggested improvement:

 func ResizeEncryptoDevice(volume, passphrase string) error {
+    if volume == "" {
+        return fmt.Errorf("volume name cannot be empty")
+    }
+    if passphrase == "" {
+        return fmt.Errorf("passphrase cannot be empty")
+    }
+
+    logrus.Debugf("Attempting to resize encrypted device for volume: %s", volume)
     devPath := types.GetVolumeDevicePath(volume, true)
     if isOpen, err := IsDeviceOpen(devPath); err != nil {
-        return err
+        return errors.Wrapf(err, "failed to check if device %s is open", devPath)
     } else if !isOpen {
         return fmt.Errorf("volume %v encrypto device is closed for resizing", volume)
     }

     namespaces := []lhtypes.Namespace{lhtypes.NamespaceMnt, lhtypes.NamespaceIpc}
     nsexec, err := lhns.NewNamespaceExecutor(lhtypes.ProcessNone, lhtypes.HostProcDirectory, namespaces)
     if err != nil {
-        return err
+        return errors.Wrap(err, "failed to create namespace executor")
     }

-    _, err = nsexec.LuksResize(volume, passphrase, lhtypes.LuksTimeout)
-    return err
+    if _, err = nsexec.LuksResize(volume, passphrase, lhtypes.LuksTimeout); err != nil {
+        return errors.Wrapf(err, "failed to resize LUKS device for volume %s", volume)
+    }
+    logrus.Infof("Successfully resized encrypted device for volume: %s", volume)
+    return nil
 }
pkg/rpc/server.go (3)

119-179: Consider refactoring common code into helper functions

The FilesystemResize method shares similar logic with existing methods like FilesystemTrim, such as device validation, mount checks, and error handling. Refactoring these common operations into reusable helper functions can enhance maintainability and reduce code duplication.


152-152: Address the TODO comment regarding encryption-aware resizing

There is a TODO comment suggesting that the encryption-aware resizing logic might belong in volume.ResizeVolume. Resolving this TODO will ensure that the resize process at server startup correctly handles encrypted volumes and improves code organization.

Would you like assistance in refactoring this logic into volume.ResizeVolume? I can help draft the necessary changes or open a GitHub issue to track this task.


160-162: Ensure precise disk format validation

Currently, the code checks if diskFormat contains "luks" in a case-insensitive manner. For more accurate validation, consider using an exact string comparison to prevent false positives or negatives.

Suggested change:

-if !strings.Contains(strings.ToLower(diskFormat), "luks") {
+if strings.EqualFold(diskFormat, "luks") {
     return &emptypb.Empty{}, grpcstatus.Errorf(grpccodes.InvalidArgument, "unsupported disk encryption format %v", diskFormat)
 }
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 7cb78af and b1f67dd.

📒 Files selected for processing (3)
  • pkg/client/share_manager_client.go (2 hunks)
  • pkg/crypto/crypto.go (1 hunks)
  • pkg/rpc/server.go (2 hunks)
🔇 Additional comments (9)
pkg/client/share_manager_client.go (4)

11-12: LGTM: Import statement is correctly placed

The rpc import is properly organized and follows Go conventions.


52-54: LGTM: Proper context handling

The timeout context creation and cancellation are properly implemented, consistent with other methods in the file.


51-57: Consider adding parameters for more flexible resizing

The current implementation uses an empty request, which might be too limiting. Consider adding parameters that would allow:

  • Specifying the target size
  • Indicating whether the volume is encrypted
  • Passing filesystem-specific options

This would make the method more future-proof and flexible.

Let's check if other resize-related code in the codebase handles these parameters:

#!/bin/bash
# Search for resize-related function signatures and their parameters
ast-grep --pattern 'func $$$Resize($$$) $$$'

51-57: Verify integration with share manager server

The implementation looks correct for the client side, but we should verify the corresponding server implementation handles all necessary filesystem resize operations for RWX volumes.

Let's check the server-side implementation:

✅ Verification successful

Server-side implementation is complete and well-integrated

The server-side implementation in pkg/rpc/server.go is comprehensive and handles:

  • Volume encryption status checks
  • Encrypted volume resizing via crypto.ResizeEncryptoDevice
  • Filesystem resizing via volume.ResizeVolume
  • Proper error handling and logging
  • Appropriate locking mechanisms

The client implementation correctly interfaces with this server-side functionality.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for the corresponding server implementation
ast-grep --pattern 'func $$$FilesystemResize($$$) $$$'

# Look for any existing filesystem resize related error handling
rg -A 3 "failed.*resize.*filesystem"

Length of output: 95


Script:

#!/bin/bash
# Let's try a broader search for FilesystemResize implementation
rg -l "FilesystemResize"

# Search for resize-related implementations in potential server files
fd "server|manager" --type f --exec rg -A 5 "Resize|resize"

# Look for protobuf service definition
fd "\.proto$" --type f --exec rg -A 5 "service.*Manager"

Length of output: 2567

pkg/crypto/crypto.go (4)

66-82: Integration with existing code looks good.

The new ResizeEncryptoDevice function:

  • Follows consistent patterns with other LUKS operations
  • Properly reuses existing helper functions
  • Maintains the same security model with namespace isolation
  • Uses consistent error handling patterns

66-82: Ensure proper error handling in the ShareManager.

Since this function is part of fixing RWX PVC resizing (issue #9736), we need to ensure that the ShareManager properly handles errors from this function, especially for the "device is closed" case.

✅ Verification successful

Let me try a different search pattern to find where this function is being called.


Let me check the server implementation to understand the context better.


Error handling for ResizeEncryptoDevice is properly implemented

The ShareManager (via the RPC server) already has proper error handling for ResizeEncryptoDevice:

  • Checks if the volume is encrypted and validates LUKS format before calling
  • Returns appropriate gRPC error with detailed message when ResizeEncryptoDevice fails
  • Specifically handles the "device is closed" error by propagating it through the gRPC error system
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Look for error handling in the ShareManager when calling this function
rg -A 5 "ResizeEncryptoDevice.*err.*:="

Length of output: 213


Script:

#!/bin/bash
# Look for calls to ResizeEncryptoDevice in the codebase
rg -B 3 -A 3 "ResizeEncryptoDevice\("

Length of output: 891


Script:

#!/bin/bash
# Look at the surrounding context of the ResizeEncryptoDevice call in the server
rg -B 20 -A 10 "ResizeEncryptoDevice\(" pkg/rpc/server.go

Length of output: 1384


66-82: Consider adding size validation for the resize operation.

The function doesn't validate or specify the target size for the resize operation. This suggests that the size information is managed externally, likely by the share manager. Please verify that size validation is properly handled by the caller.

✅ Verification successful

Size validation is properly handled by the system

The size validation concern is addressed through a layered approach:

  1. The ResizeEncryptoDevice is called as part of the node expansion flow in the RPC server
  2. After the LUKS device is resized, volume.ResizeVolume performs filesystem-level size validation using mount.NewResizeFs() which checks if resize is needed
  3. The actual resize only occurs if necessary, with proper error handling at each layer
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for calls to ResizeEncryptoDevice to understand how size is managed
ast-grep --pattern 'ResizeEncryptoDevice($volume, $passphrase)'

Length of output: 213


Script:

#!/bin/bash
# Search for calls to ResizeEncryptoDevice using ripgrep
rg -A 5 'ResizeEncryptoDevice\('

# Also search for any LUKS resize related code to understand the size handling
rg -A 5 'LuksResize'

# Search for any size validation patterns in the crypto package
rg -A 5 'size|resize' pkg/crypto/

Length of output: 2828


Script:

#!/bin/bash
# Let's check how volume.ResizeVolume handles the size validation
rg -A 10 'ResizeVolume.*devicePath.*mountPath'

# Also check the context of the RPC call
rg -B 10 -A 5 'ResizeEncryptoDevice\(vol\.Name'

Length of output: 3244


66-82: Verify LUKS resize operation timeout handling.

The function uses lhtypes.LuksTimeout for the resize operation. We should verify that this timeout is appropriate for larger volumes where resize operations might take longer.

✅ Verification successful

Let me gather more information about the LUKS resize operation implementation and any potential size-related handling.


Let me check how the cryptsetup command is executed and if there are any size-related configurations or timeout handling in the implementation.


Let me check one final thing about any volume size limits or configurations in the codebase.


Based on the analysis of the codebase, I can now provide a final response about the LUKS resize timeout concern:

The 1-minute timeout for LUKS resize operations is appropriate and safe.

The timeout handling is well-designed because:

  • The system uses a standard 1-minute timeout (LuksTimeout = time.Minute) which is sufficient as the LUKS resize operation itself is lightweight
  • The actual heavy lifting of filesystem resizing happens separately after the LUKS layer resize
  • The resize operation has proper error handling and status tracking through multiple layers:
    • Controller resize status tracking
    • Node resize status tracking
    • Transient error handling that allows operation retry
    • Terminal error states for unrecoverable situations

The LUKS resize operation only updates the encryption layer's metadata about the new size. The time-consuming part of actually resizing the filesystem is handled separately by the filesystem-specific resize operations that follow.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for LuksTimeout usage and its value definition
rg -A 2 "LuksTimeout.*=.*" 

Length of output: 394


Script:

#!/bin/bash
# Search for LuksResize implementation and related code
ast-grep --pattern 'func $_ LuksResize($$$) {
  $$$
}'

# Also check for any size-related configurations or checks
rg -B 2 -A 5 "LuksResize"

Length of output: 1398


Script:

#!/bin/bash
# Check CryptsetupWithPassphrase implementation and timeout handling
ast-grep --pattern 'func $_ CryptsetupWithPassphrase($$$) {
  $$$
}'

# Look for any volume size related configurations
rg -B 2 -A 5 "CryptsetupWithPassphrase"

Length of output: 4362


Script:

#!/bin/bash
# Look for any volume size configurations or limits
rg -B 2 -A 5 "volume.*size|size.*volume" -g '!*.md'

# Check for any resize-related configurations
rg -B 2 -A 5 "resize.*config|config.*resize" -g '!*.md'

Length of output: 47586

pkg/rpc/server.go (1)

21-22: Addition of necessary imports

The imported packages lhexec and lhtypes are required and appropriately used in the code changes.

Signed-off-by: James Munson <james.munson@suse.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (3)
pkg/rpc/server.go (2)

169-176: Consider adding metrics for resize operations

While the resize operation is implemented correctly, consider adding metrics to track:

  • Resize operation duration
  • Success/failure rates
  • Volume size before/after resize

This would help monitor the effectiveness of the resize functionality, especially for RWX PVCs.


119-179: Consider implementing retry mechanism for transient failures

For better resilience in distributed environments, consider implementing a retry mechanism with exponential backoff for:

  • Device validation
  • Mount point verification
  • Resize operations

This would help handle transient issues that are common in distributed storage systems, especially with RWX volumes accessed from multiple nodes.

pkg/server/share_manager.go (1)

Line range hint 258-264: Consider RWX-specific scenarios in resize implementation.

While the current changes handle the crypto aspects well, consider adding specific checks or logging for RWX volumes to help diagnose resize issues when multiple pods are accessing the volume.

Consider:

  1. Adding debug logging to track which node is performing the resize
  2. Implementing checks to verify if the volume is being accessed by multiple pods during resize
  3. Adding metrics or events to track resize operations for RWX volumes
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between b1f67dd and c3b101d.

⛔ Files ignored due to path filters (5)
  • go.mod is excluded by !go.mod
  • go.sum is excluded by !**/*.sum, !go.sum
  • vendor/github.com/longhorn/types/pkg/generated/smrpc/smrpc.pb.go is excluded by !**/*.pb.go, !**/generated/**, !vendor/**
  • vendor/github.com/longhorn/types/pkg/generated/smrpc/smrpc_grpc.pb.go is excluded by !**/*.pb.go, !**/generated/**, !vendor/**
  • vendor/modules.txt is excluded by !vendor/**
📒 Files selected for processing (4)
  • pkg/client/share_manager_client.go (2 hunks)
  • pkg/crypto/crypto.go (1 hunks)
  • pkg/rpc/server.go (2 hunks)
  • pkg/server/share_manager.go (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • pkg/client/share_manager_client.go
  • pkg/crypto/crypto.go
🔇 Additional comments (6)
pkg/rpc/server.go (4)

21-23: LGTM: Import changes are appropriate

The new imports for execution handling are properly aliased and necessary for the filesystem operations.


119-140: LGTM: Proper initialization and validation

The method correctly:

  • Uses mutex for thread safety
  • Validates volume name
  • Checks device validity
  • Includes appropriate error handling and logging

119-179: Verify error propagation to CSI driver

The implementation looks solid, but we should verify how errors are propagated back to the CSI driver to ensure proper handling of resize failures.

#!/bin/bash
# Description: Check error handling patterns in CSI driver integration

# Search for NodeExpandVolume implementations
ast-grep --pattern 'NodeExpandVolume($_) {
  $$$
}'

# Search for related error handling in CSI driver
rg -A 5 "NodeExpandVolume.*error"

152-167: Verify encrypted volume resize handling

The encrypted volume handling looks correct, but let's verify the interaction between ResizeEncryptoDevice and the filesystem resize operation.

✅ Verification successful

Based on the gathered information, I can now provide a final response about the encrypted volume resize handling:

Encrypted volume resize handling is correctly implemented

The implementation follows the proper sequence:

  1. For encrypted volumes, cryptsetup resize is executed first via ResizeEncryptoDevice
  2. Then filesystem resize is performed via ResizeVolume
  3. Both operations have proper error handling and logging

The code correctly handles the special case mentioned in the comments where cryptsetup resize is only needed for online resizing, as offline resizing happens automatically during 'open'.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for potential race conditions or error handling issues in crypto resize operations

# Search for other crypto resize operations
rg -A 5 "ResizeEncryptoDevice"

# Search for related error handling patterns
rg -A 5 "crypto_LUKS"

Length of output: 89


Script:

#!/bin/bash
# Let's search for the crypto package implementation and usage
fd -e go -x rg -l "package crypto"

# Find crypto-related function implementations
ast-grep --pattern 'func ResizeEncryptoDevice($_, $_) error {
  $$$
}'

# Search for volume encryption checks
rg -A 5 "IsEncrypted"

# Look for cryptsetup related operations
rg -A 5 "cryptsetup"

Length of output: 9168


Script:

#!/bin/bash
# Let's check the crypto package implementation and filesystem resize operations
rg -A 10 "package crypto" ./pkg/crypto/crypto.go

# Find filesystem resize implementation
rg -A 10 "ResizeVolume"

# Check for any error handling patterns in volume resize operations
rg -A 5 "resize.*error|failed to resize"

Length of output: 17327

pkg/server/share_manager.go (2)

Line range hint 258-264: Verify crypto resize behavior and improve documentation.

The comment about cryptsetup resize not being needed is important information that should be properly documented.

Let's verify the crypto resize behavior:

#!/bin/bash
# Description: Verify crypto resize handling across the codebase
# Test 1: Check for resize handling in crypto package
echo "Checking crypto resize handling:"
rg -A 5 "func.*[Rr]esize" --type go

# Test 2: Check for cryptsetup resize usage
echo "Checking cryptsetup resize usage:"
rg -A 5 "cryptsetup.*resize|resize.*cryptsetup" --type go

Consider adding detailed documentation in the crypto package about the resize behavior during the 'open' operation. This will help prevent future confusion about when cryptsetup resize is needed.


191-192: Improved encrypted volume detection with explicit LUKS format check.

The addition of diskFormat == "crypto_LUKS" check improves security by ensuring proper handling of encrypted volumes, even if not marked as encrypted in volume metadata.

Let's verify compatibility with existing encrypted volumes and different LUKS versions:

✅ Verification successful

The additional check for diskFormat == "crypto_LUKS" is valid and consistent with the codebase

The code change is verified to be correct for the following reasons:

  1. The codebase consistently uses "crypto_LUKS" as the format identifier for LUKS-encrypted volumes, as seen in multiple places:

    • In pkg/rpc/server.go, there's an explicit check for diskFormat == "crypto_LUKS"
    • The crypto package contains comprehensive LUKS handling functionality that aligns with this format
  2. The additional check diskFormat == "crypto_LUKS" in the condition is a defensive measure that ensures proper handling of encrypted volumes even if the volume metadata doesn't mark it as encrypted, making the code more robust.

  3. The implementation is consistent with the existing LUKS encryption handling throughout the codebase, including format checking, opening, closing, and resizing operations.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for LUKS format variations in the codebase and crypto handling
# Test 1: Search for LUKS format references
echo "Checking for LUKS format references:"
rg -i "luks|crypto.*luks" --type go

# Test 2: Search for crypto volume handling
echo "Checking crypto volume handling:"
ast-grep --pattern 'func $_(vol $_, devicePath string) $_ {
  $$$
  crypto.$_($$$)
  $$$
}'

Length of output: 27505


Script:

#!/bin/bash
# Let's check for LUKS format handling and crypto-related code
echo "Checking LUKS format handling:"
rg -A 5 "crypto_LUKS|luks|LUKS" --type go

echo -e "\nChecking crypto volume handling:"
ast-grep --pattern 'func $_(vol $_, $$$) $_ {
  $$$
  if vol.IsEncrypted() {
    $$$
  }
  $$$
}'

Length of output: 7209

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant