Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trigger primary BN readiness check when failover has issues #8193

Merged
merged 6 commits into from
Apr 12, 2024

Conversation

StefanBratanov
Copy link
Contributor

@StefanBratanov StefanBratanov commented Apr 12, 2024

PR Description

Currently we wait onAttestationAggregationDue to trigger the readiness check for beacon nodes, which in turn trigger event streaming failing over/returning to primary. However, if there is a failover issue and there are no other failovers available, in some edge cases VC has to wait around a slot time before the onPrimaryNodeBackReady callback is called and the failover to primary can happen.

The edge case is onAttestationAggregationDue happens. Both primary and failover are in a bad state. We are still connected to a failover, so we try switching but nothing is available. In the meantime, primary has recovered. However, we still have to wait until the next onAttestationAggregationDue before reconnecting.

Fixed Issue(s)

related to #8180

Documentation

  • I thought about documentation and added the doc-change-required label to this PR if updates are required.

Changelog

  • I thought about adding a changelog entry, and added one if I deemed necessary.

Copy link
Contributor

@mehdi-aouadi mehdi-aouadi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
I'd suggest adding a test for this edge case though.
Overall I'm thinking if we could de-correlate the readiness check from the onAttestationAggregationDue and make it run autonomously, async and time based or something like that...

@StefanBratanov
Copy link
Contributor Author

LGTM I'd suggest adding a test for this edge case though. Overall I'm thinking if we could de-correlate the readiness check from the onAttestationAggregationDue and make it run autonomously, async and time based or something like that...

Good idea. Added a test.

@StefanBratanov StefanBratanov enabled auto-merge (squash) April 12, 2024 16:18
@StefanBratanov StefanBratanov merged commit c862af7 into Consensys:master Apr 12, 2024
15 of 16 checks passed
@StefanBratanov StefanBratanov deleted the improve_failover branch April 12, 2024 18:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants