Skip to content

Conversation

kiryazovi-redis
Copy link
Collaborator

@kiryazovi-redis kiryazovi-redis commented Sep 16, 2025

Make sure that:

  • You have read the contribution guidelines.
  • You have created a feature request first to discuss your contribution intent. Please reference the feature request ticket number in the pull request.
  • You applied code formatting rules using the mvn formatter:format target. Don’t submit any formatting related changes.
  • You submit test cases (unit or integration tests) that back your changes.

Connection Handoff and Leak Detection for Redis Enterprise Maintenance Events

Overview

This PR introduces comprehensive testing infrastructure for Redis Enterprise maintenance events, focusing on connection handoff behavior and memory leak detection during database migrations and failovers. The implementation provides robust validation of connection lifecycle management and ensures proper cleanup of resources during maintenance operations.

Key Components

1. ConnectionHandoffTest.java

A comprehensive test suite that validates connection handoff behavior during Redis Enterprise maintenance events:

Core Features:

  • Multi-Address Type Support: Tests connection handoff with different endpoint address types:

    • EXTERNAL_IP: External IP addresses for public cloud deployments
    • EXTERNAL_FQDN: External fully qualified domain names
    • INTERNAL_IP: Internal IP addresses for private networks
    • INTERNAL_FQDN: Internal fully qualified domain names
    • NONE: No address type (null handling)
  • RESP3 Push Notification Monitoring: Captures and validates maintenance event notifications:

    • MOVING: Endpoint rebind notifications with new connection targets
    • MIGRATING: Migration start notifications
    • MIGRATED: Migration completion notifications
    • FAILING_OVER: Failover start notifications
    • FAILED_OVER: Failover completion notifications

Key Test Cases:

  1. Basic Connection Handoff Tests:

    • connectionHandedOffToNewEndpointExternalIPTest(): Validates handoff to external IP endpoints
    • connectionHandoffWithFQDNExternalNameTest(): Tests FQDN-based endpoint handoff
    • clientHandshakeWithNoneEndpointTypeTest(): Handles null address scenarios
  2. Advanced Scenarios:

    • trafficResumesAfterMovingTest(): Ensures continuous traffic flow during maintenance with 50:50 GET/SET ratio
    • newConnectionDuringRebindAfterMovingTest(): Dual connection testing during endpoint rebind
    • detectConnectionClosureAndMemoryLeaksTest(): Memory leak detection using EventBus monitoring
  3. Notification System Tests:

    • connectionHandshakeIncludesEnablingNotificationsTest(): Validates all 5 notification types
    • disabledDontReceiveNotificationsTest(): Confirms notifications are properly disabled when configured

Helper Classes:

  • HandoffCapture: Captures and validates maintenance notifications with address type validation
  • ContinuousTrafficGenerator: Generates async GET/SET traffic during maintenance operations
  • DualConnectionCapture: Manages multiple connections during rebind operations
  • AllNotificationTypesCapture: Tracks all maintenance notification types

2. ConnectionLeakDetectionUtil.java

A utility class for detecting connection closure and memory leaks using EventBus monitoring and Netty channel state inspection:

Core Capabilities:

  1. EventBus Monitoring:

    • Tracks ConnectedEvent, DisconnectedEvent, ConnectionActivatedEvent, ConnectionDeactivatedEvent
    • Uses reflection to extract channel IDs from package-private methods
    • Provides real-time connection lifecycle tracking
  2. Memory Leak Detection:

    • EventBus Level: Validates proper disconnect and deactivation events
    • Netty Level: Checks channel state (active, open, registered)
    • Connection Handoff: Verifies new connection establishment
  3. Analysis Results:

    • ConnectionAnalysisResult: Comprehensive analysis of connection closure
    • isFullyCleanedUpWithoutLeaks(): Primary indicator for memory leak detection
    • Detailed reporting of cleanup status at multiple levels

Key Methods:

  • setupEventBusMonitoring(): Configures EventBus listeners before connection creation
  • prepareForConnectionTransition(): Sets up monitoring for handoff events
  • waitForConnectionTransition(): Waits for disconnect/deactivate events with timeout
  • analyzeConnectionClosure(): Performs comprehensive cleanup analysis
  • getChannelFromConnection(): Utility for extracting Netty channels using reflection

Technical Implementation Details

Maintenance Event Flow

  1. Setup Phase:

    • Configure RESP3 protocol with maintenance events enabled
    • Set up push notification monitoring with configurable timeouts
    • Establish baseline connection state
  2. Trigger Phase:

    • Use FaultInjectionClient to trigger migrate+bind operations
    • Monitor for MIGRATED notification (migration complete)
    • Wait for MOVING notification (endpoint rebind with new address)
  3. Validation Phase:

    • Parse notification content using regex patterns
    • Validate address format matches expected type (IP vs FQDN)
    • Verify reconnection to correct endpoint
    • Test connection functionality post-handoff
  4. Cleanup Verification:

    • Monitor EventBus for proper disconnect/deactivate events
    • Check Netty channel cleanup (inactive, closed, unregistered)
    • Confirm new connection establishment with different channel ID

Address Type Validation

The implementation includes sophisticated address parsing and validation:

// IP Address Pattern (IPv4)
private static final Pattern IP_PATTERN = Pattern.compile("^((25[0-5]|(2[0-4]|1\\d|[1-9]|)\\d)\\.?\\b){4}$");

// FQDN Pattern
private static final Pattern FQDN_PATTERN = Pattern
    .compile("^[a-zA-Z0-9]([a-zA-Z0-9\\-]{0,61}[a-zA-Z0-9])?(\\.[a-zA-Z0-9]([a-zA-Z0-9\\-]{0,61}[a-zA-Z0-9])?)*$");

// MOVING Notification Pattern
private static final Pattern MOVING_PATTERN = Pattern
    .compile(">\\d+\\r?\\nMOVING\\r?\\n:([^\\r\\n]+)\\r?\\n:(\\d+)\\r?\\n([^\\r\\n]*)\\s*");

3. Fixing multiple tests and lots of refactoring.

…rise maintenance events

- Add ConnectionTesting class with 9 test scenarios for maintenance handoff behavior
- Test old connection graceful shutdown during MOVING operations
- Validate traffic resumption with autoconnect after handoff
- Verify maintenance notifications only work with RESP3 protocol
- Test new connection establishment during migration and bind phases
- Add memory leak validation for multiple concurrent connections
- Include TLS support testing for maintenance events
- Replace .supportMaintenanceEvents(true) with MaintenanceEventsOptions.enabled()
- Add comprehensive monitoring and validation of connection lifecycle

Tests cover CAE-1130 requirements for Redis Enterprise maintenance event handling
including connection draining, autoconnect behavior, and notification delivery.
…IONS

- connectionHandshakeIncludesEnablingNotificationsTest: Verifies all 5 notification types (MOVING, MIGRATING, MIGRATED, FAILING_OVER, FAILED_OVER) are received when maintenance events are enabled
- disabledDontReceiveNotificationsTest: Verifies no notifications received when maintenance events are disabled
- clientHandshakeWithEndpointTypeTest: Tests CLIENT MAINT_NOTIFICATIONS with 'none' endpoint type (nil IP scenario)
- clientMaintenanceNotificationInfoTest: Verifies CLIENT MAINT_NOTIFICATIONS configuration with moving-endpoint-type

Based on CLIENT MAINT_NOTIFICATIONS implementation from commit bd408cf
- Update push notification patterns to include sequence numbers (4-element format)
- Fix MOVING notification parsing to handle new address format with sequence and time
- Update MIGRATING, MIGRATED, FAILING_OVER, and FAILED_OVER patterns with sequence numbers
- Improve FaultInjectionClient status handling: change from 'pending' to 'running' checks
- Enhance JSON response parsing with better output field handling and debugging
- Remove deprecated maintenance sequence functionality and associated unit test
- Add test phase isolation to prevent cleanup notification interference
- Extend monitoring timeout from 2 to 5 minutes for longer maintenance operations
- Add @AfterEach cleanup to restore cluster state between tests
- Remove hardcoded optimal node selection logic in RedisEnterpriseConfig

This aligns with the updated Redis Enterprise maintenance events specification
and improves test reliability by handling the new notification protocol format.
@kiryazovi-redis kiryazovi-redis requested review from tishun, Copilot and ggivo and removed request for tishun September 16, 2025 09:58
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces comprehensive testing infrastructure for Redis Enterprise maintenance events, focusing on connection handoff behavior, timeout relaxation mechanisms, and memory leak detection during database migrations and failovers. The implementation provides robust validation of connection lifecycle management and ensures proper cleanup of resources during maintenance operations.

Key Changes:

  • Added extensive connection handoff tests with multi-address type support (External IP, External FQDN, Internal IP, Internal FQDN, None)
  • Implemented timeout relaxation testing during maintenance events with proper cleanup phase handling
  • Created connection leak detection utilities using EventBus monitoring and Netty channel state inspection
  • Updated maintenance notification patterns to handle new RESP3 format with sequence numbers

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
RelaxedTimeoutConfigurationTest.java Enhanced timeout testing with proper cleanup phase handling and command stack clearing
RedisEnterpriseConfig.java Simplified cluster configuration by removing hardcoded node logic and complex validation
MaintenancePushNotificationMonitor.java Updated to handle new notification format with sequence numbers and immediate ping monitoring
MaintenanceNotificationTest.java Updated notification patterns and test names to match new format requirements
FaultInjectionClientUnitTest.java Removed unit test file for better focus on integration testing
FaultInjectionClient.java Improved status checking and output parsing for better reliability
ConnectionLeakDetectionUtil.java New utility for detecting connection leaks using EventBus and Netty channel monitoring
ConnectionHandoffTest.java New comprehensive test suite for connection handoff scenarios with all address types
Comments suppressed due to low confidence (1)

src/test/java/io/lettuce/scenario/RelaxedTimeoutConfigurationTest.java:1

  • The conditional check for status being "running" followed by a null check in the log message is inconsistent. If status equals "running", it cannot be null, making the null check redundant.
package io.lettuce.scenario;

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Contributor

@ggivo ggivo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general.
Provided some suggestions regarding ConnectionHandoffTest covering actual handoff..

Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 4 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +59 to 62
// Patterns to parse rladmin output - updated for real Redis Enterprise format
private static final Pattern SHARD_PATTERN = Pattern
.compile("db:(\\d+)\\s+\\S+\\s+(\\S+)\\s+(node:\\d+)\\s+(master|slave)\\s+.*");
.compile("db:(\\d+)\\s+\\S+\\s+(redis:\\d+)\\s+(node:\\d+)\\s+(master|slave)\\s+.*");

Copy link
Preview

Copilot AI Sep 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These regex patterns are complex and lack documentation. Consider adding comments explaining the expected format and what each capturing group represents, especially since this appears to be parsing Redis Enterprise specific output.

Copilot uses AI. Check for mistakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants