Skip to content

feat(core): add Sailfish++ protocol variant#126

Merged
polinikita merged 21 commits intomainfrom
feat/sailfish-pp-protocol-variant
Mar 19, 2026
Merged

feat(core): add Sailfish++ protocol variant#126
polinikita merged 21 commits intomainfrom
feat/sailfish-pp-protocol-variant

Conversation

@polinikita
Copy link
Member

@polinikita polinikita commented Mar 16, 2026

Summary

Complete implementation of the Sailfish++ consensus protocol variant, based on Optimistic Signature-Free Reliable Broadcast (CCS'25). This adds a full signature-free RBC certification pipeline alongside the existing Sailfish protocol, enabling optimistic fast-path block delivery with graceful slow-path fallback.

Protocol variant and type system

  • New ConsensusProtocol::SailfishPlusPlus variant wired through all exhaustive matches (types, committee, dag_state, network, broadcaster, core, net_sync, linearizer, universal_committer)
  • RBC message types: CertMessage with Echo/Vote/Ready kinds, plus signed SailfishTimeoutMsg/SailfishNoVoteMsg with corresponding certificates
  • Precomputed optimistic thresholds on Committee (optimistic_threshold, quorum_threshold)

Certification pipeline (cert_aggregator.rs, sailfish_service.rs)

  • CertificationAggregator: stake-weighted RBC state machine tracking echo/vote/ready per block slot, emitting FastDelivery (echo quorum), SlowDelivery (ready quorum), SendVote, and SendReady events
  • SailfishService: async tokio task that orchestrates block registration, RBC message dispatch, timeout/no-vote aggregation with Ed25519 signature verification, and event batching back to the syncer
  • Early RBC message buffering: messages arriving before their block are buffered per-slot (capped at 3 * committee_size) and drained through the aggregator when the canonical block is registered

Control plane and commit rule

  • Timeout certificates (TC) and no-vote certificates (NVC) with SignedQuorumAggregator (2f+1 stake)
  • certified_parent_quorum gate in the universal committer's leader commit rule
  • Transitive certification propagation in DagState::mark_vertex_certified
  • Empty SailfishPlusPlus blocks automatically marked data-available

Core integration

  • Sailfish service wired into core_thread, syncer, and net_sync for full message routing
  • Parent selection respects certification state; own-previous counted in quorum checks
  • Header-only blocks rejected for SailfishPlusPlus

Observability

  • sailfish_rbc_fast_total / sailfish_rbc_slow_total Prometheus counters
  • Grafana dashboard panel for RBC fast/slow path rates

Documentation

  • README protocol comparison table covering DAG type, commit latency, transaction encoding, dissemination, and certification mechanism across all six protocol variants

Test plan

  • cargo check --all-features
  • cargo clippy --all-features --no-deps -- -D warnings
  • cargo clippy --all-features --tests --no-deps -- -D warnings
  • cargo +nightly fmt --check
  • Unit tests for cert_aggregator and sailfish_service (fast/slow path, buffering, equivocation, timeout/no-vote certs)
  • validator_commit("sailfish++") smoke test
  • Full CI pipeline

Add the SailfishPlusPlus consensus protocol variant based on the
SFSailfish paper (signature-free optimistic RBC, CCS'25). This commit
wires the new variant through all exhaustive match arms across the
codebase and updates the README with a protocol comparison table.

Type/enum foundations:
- CertEcho, CertVote, CertReady signature-free RBC message types
- Optimistic RBC thresholds on Committee (fast, vote, ready)
- SailfishPlusPlus variant in ConsensusProtocol enum
- CertEcho/CertVote/CertReady NetworkMessage variants
- Wave length 2, pipeline enabled in UniversalCommitterBuilder

SailfishPlusPlus follows the Mysticeti model: full blocks (no erasure
coding), pull-based dissemination, no acknowledgment references, no BLS.
- Add echo→ready threshold trigger in cert_aggregator (per SFSailfish
  paper: Ready from ceil((N+F-1)/2) echoes, votes, or F+1 readys)
- Fix UniversalCommitterBuilder: wave_length=2, pipeline=true for
  SailfishPlusPlus (was wave_length=3, pipeline=false)
- Remove all debug eprintln blocks guarded by SAILFISH_DEBUG_FLOW and
  SAILFISH_DEBUG_COMMIT environment variables
- Move prometheus::Registry import to #[cfg(test)] in sailfish_service
- Rename collect_subdag_mysticeti → collect_subdag_ancestors and
  collect_subdag_starfish → collect_subdag_acknowledgments
- Update sailfish_service and cert_aggregator tests to match new
  echo→ready event flow
Implement the SFSailfish control-plane messages (timeout certificates
and no-vote certificates) that enable liveness under faults and direct
skip in the commit rule.

Types: SailfishTimeoutMsg/Cert, SailfishNoVoteMsg/Cert, SailfishFields
embedded in BlockHeader. Ed25519 signed with domain-separated digests.

Crypto: Signer::sign_digest, PublicKey::verify_digest_signature,
sailfish_timeout_digest, sailfish_novote_digest helpers.

Network: SailfishTimeout and SailfishNoVote message variants.

Service: SignedQuorumAggregator shared by timeout and no-vote paths.
Aggregates signed messages until 2f+1 quorum, emits TimeoutReady and
NoVoteReady events.

DagState: Stores timeout/no-vote certs with BTreeMap, accessible via
add/get/has methods, cleaned up via split_off.

Core: Block creation gated by sailfish_control_ready — requires TC when
lacking parent to previous leader, NVC additionally for round leader.
SailfishFields computed and embedded in block header.

Committer: Direct skip from NVC in try_commit_sailfish. Backward walk
also uses NVC for skip resolution.

Validation: verify_signed_quorum helper checks signer uniqueness,
quorum stake, and Ed25519 signatures for both TC and NVC.

Fixes incorrect SFSailfish paper link in README (eprint → arxiv).
Adds sailfish-pp to dryrun.sh options comment.
Remove the sailfish_control_ready gate from try_new_block — it deadlocks
block creation because the timeout mechanism that produces TCs is not
yet wired. The certified_parent_quorum gate already ensures safety;
the control-plane gate will be re-enabled once timeout triggers are
complete.

Block creation is already retried on certificate events via
apply_sailfish_certificates → try_new_block, so the certified parent
quorum gate unblocks naturally as RBC completes.

Also add explicit rejection of header-only blocks for protocols that
require full blocks (SailfishPlusPlus, Mysticeti, CordialMiners).
Phase 1-2: Fix invalid block creation
- After certified-parent filtering, check that remaining parents still
  have quorum stake at round-1. If not, requeue and wait.
- For SailfishPlusPlus, preserve all previous-round references during
  compression so certified-parent filtering keeps quorum.

Phase 3: Local timeout origination
- ServiceState holds a Signer; handle_local_timeout signs, self-counts,
  and emits BroadcastTimeout.
- leader_timeout_task sends LocalTimeout to sailfish service for
  SailfishPlusPlus before force_new_block.

Phase 4: Local no-vote origination
- handle_local_novote signs, self-counts, and emits SendNoVote.
- create_new_block triggers LocalNoVote when the created block lacks a
  parent to the previous-round leader.
- NoVote routed only to the next-round elected leader, not broadcast.

Phase 5: Signature verification before aggregation
- add_timeout_msg and add_novote_msg verify Ed25519 signatures against
  domain-separated digests before counting stake.

Phase 6: Header validation (relaxed)
- Validate TC/NVC signatures and quorum when present in block headers.
- Do not yet enforce mandatory presence (control plane still ramping).
…cludes

Two bugs causing the dryrun stall:

1. The post-filter quorum check in collect_transactions_and_references
   did not account for the creator's own previous block, which
   build_block always prepends. This caused valid proposals to be
   rejected even when own_previous + peer parents had quorum.

2. On failed proposals, get_pending_transactions drains Include refs
   from self.pending, but requeue_transactions only puts back Payload.
   The Include refs were permanently lost, so subsequent retries saw
   an empty frontier. Now collect_transactions_and_references returns
   the raw include refs on failure so the caller can requeue them.
The is_empty_full_block check in update_data_availability matched only
Mysticeti and CordialMiners. SailfishPlusPlus blocks with empty payloads
(transactions: None, empty merkle root) were never marked data-available,
blocking drain_available_commits and preventing transaction metrics from
being reported.
Add sailfish_rbc_fast_total and sailfish_rbc_slow_total counters that
track how many vertices are certified via the optimistic fast path
(echo quorum) vs the slow path (ready quorum).

Incremented in the sailfish service dispatch_cert_events on
FastDelivery and SlowDelivery events respectively.

Add a "Sailfish++ RBC certification path" panel to the Grafana
dashboard showing the rate of fast vs slow certifications.
When a peer's Echo/Vote/Ready arrives before we've seen the block,
buffer it keyed by (round, authority) slot instead of dropping it.
Drain matching messages through the aggregator when the canonical
block is registered via ProcessBlocks. Conflicting-digest messages
in the buffer are silently discarded. Per-slot buffer is capped at
3 * committee_size to bound memory.
- Run cargo fmt to fix formatting violations
- Wrap long error strings in types.rs for editorconfig compliance
- Shorten test fn name exceeding 100-char line limit
- Fix batch_vertex_certification_updates_quorum_view: insert actual
  blocks before certifying (parent-closure check requires them)
- Fix sailfish_service timeout/novote tests: use new_for_benchmarks
  committee so public keys match Signer::new_for_test signatures
- Revert dryrun.sh to main (local testing defaults)
@polinikita polinikita merged commit a0ac2e8 into main Mar 19, 2026
10 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant