-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(resize): add rpc method to resize filesystem #347
base: master
Are you sure you want to change the base?
Conversation
WalkthroughThe changes in this pull request introduce new methods for resizing filesystems in the ShareManagerClient and ShareManagerServer components, specifically the Changes
Assessment against linked issues
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (5)
pkg/client/share_manager_client.go (1)
51-57
: Add documentation for the FilesystemResize methodPlease add godoc comments explaining:
- The purpose of this method
- Any pre-conditions that must be met
- Possible errors that might be returned
Example:
+// FilesystemResize triggers a filesystem resize operation on the share manager. +// It assumes the underlying volume has already been expanded. +// Returns an error if the resize operation fails or times out. func (c *ShareManagerClient) FilesystemResize() error {pkg/crypto/crypto.go (1)
66-82
: Add input validation and improve error messages.The function implementation looks solid but could benefit from some improvements:
- Add validation for empty volume name and passphrase
- Make error messages more descriptive for debugging
- Add logging for audit purposes
Here's a suggested improvement:
func ResizeEncryptoDevice(volume, passphrase string) error { + if volume == "" { + return fmt.Errorf("volume name cannot be empty") + } + if passphrase == "" { + return fmt.Errorf("passphrase cannot be empty") + } + + logrus.Debugf("Attempting to resize encrypted device for volume: %s", volume) devPath := types.GetVolumeDevicePath(volume, true) if isOpen, err := IsDeviceOpen(devPath); err != nil { - return err + return errors.Wrapf(err, "failed to check if device %s is open", devPath) } else if !isOpen { return fmt.Errorf("volume %v encrypto device is closed for resizing", volume) } namespaces := []lhtypes.Namespace{lhtypes.NamespaceMnt, lhtypes.NamespaceIpc} nsexec, err := lhns.NewNamespaceExecutor(lhtypes.ProcessNone, lhtypes.HostProcDirectory, namespaces) if err != nil { - return err + return errors.Wrap(err, "failed to create namespace executor") } - _, err = nsexec.LuksResize(volume, passphrase, lhtypes.LuksTimeout) - return err + if _, err = nsexec.LuksResize(volume, passphrase, lhtypes.LuksTimeout); err != nil { + return errors.Wrapf(err, "failed to resize LUKS device for volume %s", volume) + } + logrus.Infof("Successfully resized encrypted device for volume: %s", volume) + return nil }pkg/rpc/server.go (3)
119-179
: Consider refactoring common code into helper functionsThe
FilesystemResize
method shares similar logic with existing methods likeFilesystemTrim
, such as device validation, mount checks, and error handling. Refactoring these common operations into reusable helper functions can enhance maintainability and reduce code duplication.
152-152
: Address the TODO comment regarding encryption-aware resizingThere is a TODO comment suggesting that the encryption-aware resizing logic might belong in
volume.ResizeVolume
. Resolving this TODO will ensure that the resize process at server startup correctly handles encrypted volumes and improves code organization.Would you like assistance in refactoring this logic into
volume.ResizeVolume
? I can help draft the necessary changes or open a GitHub issue to track this task.
160-162
: Ensure precise disk format validationCurrently, the code checks if
diskFormat
contains "luks" in a case-insensitive manner. For more accurate validation, consider using an exact string comparison to prevent false positives or negatives.Suggested change:
-if !strings.Contains(strings.ToLower(diskFormat), "luks") { +if strings.EqualFold(diskFormat, "luks") { return &emptypb.Empty{}, grpcstatus.Errorf(grpccodes.InvalidArgument, "unsupported disk encryption format %v", diskFormat) }
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
📒 Files selected for processing (3)
pkg/client/share_manager_client.go
(2 hunks)pkg/crypto/crypto.go
(1 hunks)pkg/rpc/server.go
(2 hunks)
🔇 Additional comments (9)
pkg/client/share_manager_client.go (4)
11-12
: LGTM: Import statement is correctly placed
The rpc import is properly organized and follows Go conventions.
52-54
: LGTM: Proper context handling
The timeout context creation and cancellation are properly implemented, consistent with other methods in the file.
51-57
: Consider adding parameters for more flexible resizing
The current implementation uses an empty request, which might be too limiting. Consider adding parameters that would allow:
- Specifying the target size
- Indicating whether the volume is encrypted
- Passing filesystem-specific options
This would make the method more future-proof and flexible.
Let's check if other resize-related code in the codebase handles these parameters:
#!/bin/bash
# Search for resize-related function signatures and their parameters
ast-grep --pattern 'func $$$Resize($$$) $$$'
51-57
: Verify integration with share manager server
The implementation looks correct for the client side, but we should verify the corresponding server implementation handles all necessary filesystem resize operations for RWX volumes.
Let's check the server-side implementation:
✅ Verification successful
Server-side implementation is complete and well-integrated
The server-side implementation in pkg/rpc/server.go
is comprehensive and handles:
- Volume encryption status checks
- Encrypted volume resizing via
crypto.ResizeEncryptoDevice
- Filesystem resizing via
volume.ResizeVolume
- Proper error handling and logging
- Appropriate locking mechanisms
The client implementation correctly interfaces with this server-side functionality.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for the corresponding server implementation
ast-grep --pattern 'func $$$FilesystemResize($$$) $$$'
# Look for any existing filesystem resize related error handling
rg -A 3 "failed.*resize.*filesystem"
Length of output: 95
Script:
#!/bin/bash
# Let's try a broader search for FilesystemResize implementation
rg -l "FilesystemResize"
# Search for resize-related implementations in potential server files
fd "server|manager" --type f --exec rg -A 5 "Resize|resize"
# Look for protobuf service definition
fd "\.proto$" --type f --exec rg -A 5 "service.*Manager"
Length of output: 2567
pkg/crypto/crypto.go (4)
66-82
: Integration with existing code looks good.
The new ResizeEncryptoDevice
function:
- Follows consistent patterns with other LUKS operations
- Properly reuses existing helper functions
- Maintains the same security model with namespace isolation
- Uses consistent error handling patterns
66-82
: Ensure proper error handling in the ShareManager.
Since this function is part of fixing RWX PVC resizing (issue #9736), we need to ensure that the ShareManager properly handles errors from this function, especially for the "device is closed" case.
✅ Verification successful
Let me try a different search pattern to find where this function is being called.
Let me check the server implementation to understand the context better.
Error handling for ResizeEncryptoDevice is properly implemented
The ShareManager (via the RPC server) already has proper error handling for ResizeEncryptoDevice:
- Checks if the volume is encrypted and validates LUKS format before calling
- Returns appropriate gRPC error with detailed message when ResizeEncryptoDevice fails
- Specifically handles the "device is closed" error by propagating it through the gRPC error system
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Look for error handling in the ShareManager when calling this function
rg -A 5 "ResizeEncryptoDevice.*err.*:="
Length of output: 213
Script:
#!/bin/bash
# Look for calls to ResizeEncryptoDevice in the codebase
rg -B 3 -A 3 "ResizeEncryptoDevice\("
Length of output: 891
Script:
#!/bin/bash
# Look at the surrounding context of the ResizeEncryptoDevice call in the server
rg -B 20 -A 10 "ResizeEncryptoDevice\(" pkg/rpc/server.go
Length of output: 1384
66-82
: Consider adding size validation for the resize operation.
The function doesn't validate or specify the target size for the resize operation. This suggests that the size information is managed externally, likely by the share manager. Please verify that size validation is properly handled by the caller.
✅ Verification successful
Size validation is properly handled by the system
The size validation concern is addressed through a layered approach:
- The
ResizeEncryptoDevice
is called as part of the node expansion flow in the RPC server - After the LUKS device is resized,
volume.ResizeVolume
performs filesystem-level size validation usingmount.NewResizeFs()
which checks if resize is needed - The actual resize only occurs if necessary, with proper error handling at each layer
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for calls to ResizeEncryptoDevice to understand how size is managed
ast-grep --pattern 'ResizeEncryptoDevice($volume, $passphrase)'
Length of output: 213
Script:
#!/bin/bash
# Search for calls to ResizeEncryptoDevice using ripgrep
rg -A 5 'ResizeEncryptoDevice\('
# Also search for any LUKS resize related code to understand the size handling
rg -A 5 'LuksResize'
# Search for any size validation patterns in the crypto package
rg -A 5 'size|resize' pkg/crypto/
Length of output: 2828
Script:
#!/bin/bash
# Let's check how volume.ResizeVolume handles the size validation
rg -A 10 'ResizeVolume.*devicePath.*mountPath'
# Also check the context of the RPC call
rg -B 10 -A 5 'ResizeEncryptoDevice\(vol\.Name'
Length of output: 3244
66-82
: Verify LUKS resize operation timeout handling.
The function uses lhtypes.LuksTimeout
for the resize operation. We should verify that this timeout is appropriate for larger volumes where resize operations might take longer.
✅ Verification successful
Let me gather more information about the LUKS resize operation implementation and any potential size-related handling.
Let me check how the cryptsetup command is executed and if there are any size-related configurations or timeout handling in the implementation.
Let me check one final thing about any volume size limits or configurations in the codebase.
Based on the analysis of the codebase, I can now provide a final response about the LUKS resize timeout concern:
The 1-minute timeout for LUKS resize operations is appropriate and safe.
The timeout handling is well-designed because:
- The system uses a standard 1-minute timeout (
LuksTimeout = time.Minute
) which is sufficient as the LUKS resize operation itself is lightweight - The actual heavy lifting of filesystem resizing happens separately after the LUKS layer resize
- The resize operation has proper error handling and status tracking through multiple layers:
- Controller resize status tracking
- Node resize status tracking
- Transient error handling that allows operation retry
- Terminal error states for unrecoverable situations
The LUKS resize operation only updates the encryption layer's metadata about the new size. The time-consuming part of actually resizing the filesystem is handled separately by the filesystem-specific resize operations that follow.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for LuksTimeout usage and its value definition
rg -A 2 "LuksTimeout.*=.*"
Length of output: 394
Script:
#!/bin/bash
# Search for LuksResize implementation and related code
ast-grep --pattern 'func $_ LuksResize($$$) {
$$$
}'
# Also check for any size-related configurations or checks
rg -B 2 -A 5 "LuksResize"
Length of output: 1398
Script:
#!/bin/bash
# Check CryptsetupWithPassphrase implementation and timeout handling
ast-grep --pattern 'func $_ CryptsetupWithPassphrase($$$) {
$$$
}'
# Look for any volume size related configurations
rg -B 2 -A 5 "CryptsetupWithPassphrase"
Length of output: 4362
Script:
#!/bin/bash
# Look for any volume size configurations or limits
rg -B 2 -A 5 "volume.*size|size.*volume" -g '!*.md'
# Check for any resize-related configurations
rg -B 2 -A 5 "resize.*config|config.*resize" -g '!*.md'
Length of output: 47586
pkg/rpc/server.go (1)
21-22
: Addition of necessary imports
The imported packages lhexec
and lhtypes
are required and appropriately used in the code changes.
b1f67dd
to
c3b1b0d
Compare
Signed-off-by: James Munson <james.munson@suse.com>
57e3bda
to
c3b101d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (3)
pkg/rpc/server.go (2)
169-176
: Consider adding metrics for resize operationsWhile the resize operation is implemented correctly, consider adding metrics to track:
- Resize operation duration
- Success/failure rates
- Volume size before/after resize
This would help monitor the effectiveness of the resize functionality, especially for RWX PVCs.
119-179
: Consider implementing retry mechanism for transient failuresFor better resilience in distributed environments, consider implementing a retry mechanism with exponential backoff for:
- Device validation
- Mount point verification
- Resize operations
This would help handle transient issues that are common in distributed storage systems, especially with RWX volumes accessed from multiple nodes.
pkg/server/share_manager.go (1)
Line range hint
258-264
: Consider RWX-specific scenarios in resize implementation.While the current changes handle the crypto aspects well, consider adding specific checks or logging for RWX volumes to help diagnose resize issues when multiple pods are accessing the volume.
Consider:
- Adding debug logging to track which node is performing the resize
- Implementing checks to verify if the volume is being accessed by multiple pods during resize
- Adding metrics or events to track resize operations for RWX volumes
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
⛔ Files ignored due to path filters (5)
go.mod
is excluded by!go.mod
go.sum
is excluded by!**/*.sum
,!go.sum
vendor/github.com/longhorn/types/pkg/generated/smrpc/smrpc.pb.go
is excluded by!**/*.pb.go
,!**/generated/**
,!vendor/**
vendor/github.com/longhorn/types/pkg/generated/smrpc/smrpc_grpc.pb.go
is excluded by!**/*.pb.go
,!**/generated/**
,!vendor/**
vendor/modules.txt
is excluded by!vendor/**
📒 Files selected for processing (4)
pkg/client/share_manager_client.go
(2 hunks)pkg/crypto/crypto.go
(1 hunks)pkg/rpc/server.go
(2 hunks)pkg/server/share_manager.go
(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- pkg/client/share_manager_client.go
- pkg/crypto/crypto.go
🔇 Additional comments (6)
pkg/rpc/server.go (4)
21-23
: LGTM: Import changes are appropriate
The new imports for execution handling are properly aliased and necessary for the filesystem operations.
119-140
: LGTM: Proper initialization and validation
The method correctly:
- Uses mutex for thread safety
- Validates volume name
- Checks device validity
- Includes appropriate error handling and logging
119-179
: Verify error propagation to CSI driver
The implementation looks solid, but we should verify how errors are propagated back to the CSI driver to ensure proper handling of resize failures.
#!/bin/bash
# Description: Check error handling patterns in CSI driver integration
# Search for NodeExpandVolume implementations
ast-grep --pattern 'NodeExpandVolume($_) {
$$$
}'
# Search for related error handling in CSI driver
rg -A 5 "NodeExpandVolume.*error"
152-167
: Verify encrypted volume resize handling
The encrypted volume handling looks correct, but let's verify the interaction between ResizeEncryptoDevice
and the filesystem resize operation.
✅ Verification successful
Based on the gathered information, I can now provide a final response about the encrypted volume resize handling:
Encrypted volume resize handling is correctly implemented
The implementation follows the proper sequence:
- For encrypted volumes,
cryptsetup resize
is executed first viaResizeEncryptoDevice
- Then filesystem resize is performed via
ResizeVolume
- Both operations have proper error handling and logging
The code correctly handles the special case mentioned in the comments where cryptsetup resize is only needed for online resizing, as offline resizing happens automatically during 'open'.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for potential race conditions or error handling issues in crypto resize operations
# Search for other crypto resize operations
rg -A 5 "ResizeEncryptoDevice"
# Search for related error handling patterns
rg -A 5 "crypto_LUKS"
Length of output: 89
Script:
#!/bin/bash
# Let's search for the crypto package implementation and usage
fd -e go -x rg -l "package crypto"
# Find crypto-related function implementations
ast-grep --pattern 'func ResizeEncryptoDevice($_, $_) error {
$$$
}'
# Search for volume encryption checks
rg -A 5 "IsEncrypted"
# Look for cryptsetup related operations
rg -A 5 "cryptsetup"
Length of output: 9168
Script:
#!/bin/bash
# Let's check the crypto package implementation and filesystem resize operations
rg -A 10 "package crypto" ./pkg/crypto/crypto.go
# Find filesystem resize implementation
rg -A 10 "ResizeVolume"
# Check for any error handling patterns in volume resize operations
rg -A 5 "resize.*error|failed to resize"
Length of output: 17327
pkg/server/share_manager.go (2)
Line range hint 258-264
: Verify crypto resize behavior and improve documentation.
The comment about cryptsetup resize
not being needed is important information that should be properly documented.
Let's verify the crypto resize behavior:
#!/bin/bash
# Description: Verify crypto resize handling across the codebase
# Test 1: Check for resize handling in crypto package
echo "Checking crypto resize handling:"
rg -A 5 "func.*[Rr]esize" --type go
# Test 2: Check for cryptsetup resize usage
echo "Checking cryptsetup resize usage:"
rg -A 5 "cryptsetup.*resize|resize.*cryptsetup" --type go
Consider adding detailed documentation in the crypto package about the resize behavior during the 'open' operation. This will help prevent future confusion about when cryptsetup resize
is needed.
191-192
: Improved encrypted volume detection with explicit LUKS format check.
The addition of diskFormat == "crypto_LUKS"
check improves security by ensuring proper handling of encrypted volumes, even if not marked as encrypted in volume metadata.
Let's verify compatibility with existing encrypted volumes and different LUKS versions:
✅ Verification successful
The additional check for diskFormat == "crypto_LUKS"
is valid and consistent with the codebase
The code change is verified to be correct for the following reasons:
-
The codebase consistently uses "crypto_LUKS" as the format identifier for LUKS-encrypted volumes, as seen in multiple places:
- In
pkg/rpc/server.go
, there's an explicit check fordiskFormat == "crypto_LUKS"
- The crypto package contains comprehensive LUKS handling functionality that aligns with this format
- In
-
The additional check
diskFormat == "crypto_LUKS"
in the condition is a defensive measure that ensures proper handling of encrypted volumes even if the volume metadata doesn't mark it as encrypted, making the code more robust. -
The implementation is consistent with the existing LUKS encryption handling throughout the codebase, including format checking, opening, closing, and resizing operations.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Check for LUKS format variations in the codebase and crypto handling
# Test 1: Search for LUKS format references
echo "Checking for LUKS format references:"
rg -i "luks|crypto.*luks" --type go
# Test 2: Search for crypto volume handling
echo "Checking crypto volume handling:"
ast-grep --pattern 'func $_(vol $_, devicePath string) $_ {
$$$
crypto.$_($$$)
$$$
}'
Length of output: 27505
Script:
#!/bin/bash
# Let's check for LUKS format handling and crypto-related code
echo "Checking LUKS format handling:"
rg -A 5 "crypto_LUKS|luks|LUKS" --type go
echo -e "\nChecking crypto volume handling:"
ast-grep --pattern 'func $_(vol $_, $$$) $_ {
$$$
if vol.IsEncrypted() {
$$$
}
$$$
}'
Length of output: 7209
Which issue(s) this PR fixes:
Issue longhorn/longhorn#9736 aka longhorn/longhorn#8118
What this PR does / why we need it:
Special notes for your reviewer:
Additional documentation or context