-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Labels
backlogmediumnostr-frostNostr FROST coordination protocolNostr FROST coordination protocolp2PriorityPrioritysecuritySecurity-related issuesSecurity-related issues
Description
Problem
frost_coordinator.c connects to Nostr relays for DKG and signing coordination, but lacks robust connection health monitoring and automatic recovery. In production environments, relay connections can drop due to network instability, relay restarts, or timeouts. Without proper handling, signing sessions could fail silently or hang indefinitely.
Current Behavior
- Single connection attempt per relay
- No heartbeat/ping monitoring
- No automatic reconnection on disconnect
- No multi-relay failover strategy
Proposed Solution
1. Connection Health Monitoring
typedef struct {
uint32_t last_pong_time;
uint32_t ping_interval_ms;
uint8_t missed_pongs;
bool healthy;
} ws_health_t;- Send periodic WebSocket pings (every 30s)
- Track pong responses
- Mark connection unhealthy after 2-3 missed pongs
2. Automatic Reconnection
#define WS_RECONNECT_BASE_MS 1000
#define WS_RECONNECT_MAX_MS 30000
#define WS_RECONNECT_MAX_ATTEMPTS 5
typedef struct {
uint8_t attempt_count;
uint32_t next_retry_ms;
coordinator_state_t state_before_disconnect;
} ws_reconnect_t;- Exponential backoff: 1s → 2s → 4s → 8s → 16s → 30s (capped)
- Preserve session state during reconnection
- Re-subscribe to relevant event filters on reconnect
3. Multi-Relay Failover
- If primary relay fails, attempt secondary
- Track relay health scores (successful ops / total attempts)
- Prefer healthier relays for new sessions
4. Session Recovery
- Buffer outbound events during brief disconnects
- Replay buffered events on reconnection
- Timeout and fail session if reconnection exceeds threshold (e.g., 60s)
Implementation Notes
- Use
esp_websocket_clientping/pong callbacks - Integrate with existing
relay_connection_tstructure - Add connection state to coordinator status reporting
- Log reconnection attempts for debugging
Acceptance Criteria
- Ping/pong health monitoring with configurable interval
- Automatic reconnection with exponential backoff
- Session state preserved across brief disconnects
- Clean session failure if reconnection exceeds timeout
- Health metrics exposed via coordinator status
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
backlogmediumnostr-frostNostr FROST coordination protocolNostr FROST coordination protocolp2PriorityPrioritysecuritySecurity-related issuesSecurity-related issues