Skip to content

Checkpoints and replication

HagarMeir edited this page May 23, 2019 · 7 revisions

Each proposal that gets delivered to the application is signed by 2f+1. The signatures are piggybacked to the Commit messages. Therefore, each decided proposal can essentially serve as a checkpoint, as defined by PBFT.

As a result, the Write Ahead Log (WAL) can be truncated at any point in time, but for efficiency reasons it is truncated whenever it reaches a certain threshold.

When a node was disconnected for too long, it may have missed some proposals that have been committed, and as a result it needs to fill the gap before it can deliver new proposals to the application. The node starts a view, the last view that it knows about, and may receive consensus messages with different sequence numbers or heartbeats from the leader that doesn't match the started view. The node does not distinguish between being lied too (by byzantine nodes) and being behind. Therefore, it calls Sync() provided by the application layer and sends a ViewChange message.

Even if the node was disconnected and missed a few decisions it may still participate in the consensus, assuming it has the prevHeader as required by VerifyProposal. We distinguish between two scenarios:

  1. The node doesn't have the latest verification sequence so it cannot verify the in-flight proposals.
  2. The node has the latest verification sequence so it can verify the in-flight proposals.

If (1) occurs, the replicated state machine calls Sync() which is provided by the application layer, and should:

  • Contacts at least 2f+1 nodes to obtain the latest verification sequence, the last view number, and the last sequence in that view.
  • Replicates the decided proposals.

Since several proposals could have been created during the invocation of Sync(), the node might start participating in the latest view while still having a gap of proposals that weren't delivered to the application. Because the decisions must be delivered to the application in order, the node writes the new reached decisions to a temporary WAL, calls Sync() again to fill in the gap, and later delivers the decisions to the application layer by reading from the temporary WAL.

Clone this wiki locally