-
Notifications
You must be signed in to change notification settings - Fork 234
feat: increase default ReadinessMaxBlocksBehind from 3 to 30 #2779
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
The previous default of 3 blocks caused false-negatives during normal p2p sync operations, where nodes routinely lag by 6-7 seconds (12-14 blocks at 500ms block time) while executing blocks in batches. This change raises the default to 30 blocks (15s worth of blocks with 0.5s block time), which maintains the ability to detect stuck nodes while eliminating false-positives during normal batch-sync operations. Fixes #2778 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> Co-authored-by: Marko <tac0turtle@users.noreply.github.com>
|
Claude finished @tac0turtle's task —— View job Code Review Complete ✅SummaryThis PR increases the default Detailed Review1. Code Quality & Architecture ✅Strengths:
Observations:
2. Correctness & Implementation ✅Change Analysis: // Before: ReadinessMaxBlocksBehind: 3,
// After: ReadinessMaxBlocksBehind: 30,Rationale Assessment:
Usage in codebase ( allowedBlocksBehind := cfg.Node.ReadinessMaxBlocksBehind
if bestKnownHeight-localHeight > allowedBlocksBehind {
http.Error(w, "UNREADY: behind best-known head", http.StatusServiceUnavailable)
return
}The implementation correctly uses this value to determine node readiness. ✅ 3. Testing
|
|
The latest Buf updates on your PR. Results from workflow CI and Release / buf-check (pull_request).
|
|
@claude instead of hardcoding a value, please use a dynamic one based of config.blocktime |
|
Claude finished @auricom's task —— View job Implementation Complete ✅Successfully made Task List
Implementation DetailsDynamic Calculation:
Scaling Behavior:
This ensures the 15-second window for normal batch-sync latency is maintained regardless of the configured block time, while still detecting stuck nodes. Test Updates:
Tests now reflect the new default of 15 blocks for the 1-second block time. Changed Files
View Job • Branch: |
Instead of hardcoding ReadinessMaxBlocksBehind to 30, calculate it dynamically as 15 seconds worth of blocks based on the configured block time. This ensures the threshold scales appropriately: - 0.5s block time → 30 blocks (same as before for testnet) - 1.0s block time → 15 blocks (new default) - 2.0s block time → 7 blocks (scales down appropriately) The 15-second window allows for normal batch-sync latency while still detecting stuck nodes, regardless of block time configuration. Changes: - Added calculateReadinessMaxBlocksBehind() helper function - Updated DefaultConfig() to use calculated value - Updated tests to reflect new default (15 blocks for 1s block time) - Includes fallback to 30 if block time is 0 Co-authored-by: auricom <auricom@users.noreply.github.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2779 +/- ##
==========================================
- Coverage 61.57% 61.55% -0.02%
==========================================
Files 82 82
Lines 8715 8727 +12
==========================================
+ Hits 5366 5372 +6
- Misses 2845 2849 +4
- Partials 504 506 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@claude do not hardcode 15seconds, make this value part of config so that a user can override our default value |
|
Claude finished @auricom's task —— View job Implementation Complete ✅Successfully made the readiness window configurable instead of hardcoded to 15 seconds. Task List
Implementation DetailsNew Configuration Field:
Dynamic Calculation:
Behavior Examples:
With custom 30-second window:
Test Updates:
Changed Files
View Job • Branch: |
Added ReadinessWindowSeconds config field (default: 15s) to allow users to override the time window used for calculating ReadinessMaxBlocksBehind based on block time. This provides flexibility while maintaining the dynamic calculation based on block time. Changes: - Added ReadinessWindowSeconds field to NodeConfig - Updated calculateReadinessMaxBlocksBehind to accept window parameter - Added CLI flag and default value (15 seconds) - Updated tests to verify new configuration Co-authored-by: auricom <auricom@users.noreply.github.com>
|
I tried to deploy it locally to check if that works. but i cannot sync the local node as main is broken, so readiness is always offline. |
This PR increases the default
ReadinessMaxBlocksBehindvalue from 3 to 30 blocks to avoid false-negatives during normal p2p sync operations.Problem
The current default of 3 blocks marks most healthy nodes as "not ready" because they are routinely >3 blocks behind best-head while the p2p sync executes blocks in batches. Observability tools that scrape
/health/readyevery 5s therefore report the node as permanently offline.Solution
Raised the default to 30 blocks (≈ 15s worth of blocks with 0.5s block time). This keeps the safety property ("stuck node is detected") while removing the false-positive during normal batch-sync.
Changes
pkg/config/defaults.goto changeReadinessMaxBlocksBehindfrom 3 to 30Fixes #2778
Generated with Claude Code