Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lr/eqc voting #3851

Merged
merged 77 commits into from
Nov 13, 2024
Merged

Lr/eqc voting #3851

merged 77 commits into from
Nov 13, 2024

Conversation

lukaszrzasik
Copy link
Contributor

@lukaszrzasik lukaszrzasik commented Nov 5, 2024

Closes #3730

This PR:

  • Introduces the concept of "epoch safety check". The epoch safety check makes sure that:
    • the proposed block and its justify QC's block are from the same epoch or
    • the justify QC is an Extended Quorum Certificate for the previous epoch.
  • Adds logic to broadcast the vote for the last proposal in eQC 3-chain.
  • Adds logic to not check whether the node is the leader when receiving a vote for the last proposal in eQC 3-chain. This allows each node to form the eQC on their own.

This PR does not:

Key places to review:

rob-maron and others added 9 commits November 8, 2024 12:00
* tests and CI

* remove `async-std`

* `fmt`

* fix doc build

* remove counter tests

* remove counter tests

* build w/o lockfile

* `lock` -> `std`

* Revert "`lock` -> `std`"

This reverts commit 21ebf05.

* lock

* `async_broadcast`

* overflow

* Revert "`async_broadcast`"

This reverts commit f03bb57.

* `try_send`
Comment on lines 972 to 974
if parent_leaf.height() == 0 {
return true;
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure but this is probably not enough. Is there a way to check if the leaf is the genesis leaf?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just check if proposed_leaf.justify_qc is the gensis qc?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe the only way is if the QC view number is the genesis view TYPES::View::genesis

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed the condition to use TYPES::View::genesis as you suggested. Getting rid of the magic number is an additional benefit.

@lukaszrzasik lukaszrzasik marked this pull request as ready for review November 8, 2024 17:27
Copy link
Collaborator

@bfish713 bfish713 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense, I need to digest a bit more on how all the logic fits together

Comment on lines 48 to 64
if is_vote_leaf_extended {
// We do not check if we are the leader when receiving a vote for the last eQC proposal.
// Every node should receive the votes and form an eQC.
tracing::debug!("We have received a vote for the last eQC proposal");
} else {
// Are we the leader for this view?
ensure!(
task_state
.quorum_membership
.leader(vote.view_number() + 1, task_state.cur_epoch)?
== task_state.public_key,
info!(
"We are not the leader for view {:?}",
vote.view_number() + 1
)
);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: ensure!(is_vote_leaf_extended || we_are_leader)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thank you!

Comment on lines 977 to 985
let Some(View {
view_inner: ViewInner::Leaf {
leaf: leaf_commit, ..
},
}) = self.validated_state_map.get(&parent_leaf.view_number())
else {
tracing::error!("Couldn't find the provided old leaf in the consensus");
return false;
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have the parent leaf, and need the commit why not just use parent_leaf.commit() instead of looking up in the map (where the leaf presumably already came from)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I guess I got confused by two commit methods existing for the Leaf. I didn't want to use the upgrade lock and make the method async.

@@ -963,6 +966,25 @@ impl<TYPES: NodeType> Consensus<TYPES> {
let epoch_number = epoch_from_block_number(next_block_number, self.epoch_height);
Ok(TYPES::Epoch::new(epoch_number))
}

/// Returns true if the `parent_leaf` formed an eQC for the previous epoch to the `proposed_leaf`
pub fn check_eqc(&self, proposed_leaf: &Leaf<TYPES>, parent_leaf: &Leaf<TYPES>) -> bool {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is checking does proposed_leaf contain an eQC as it' justify QC, is that right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, correct. I could remove the parent_leaf parameter, use the proposed_leaf's justify qc to get the parent leaf from the shared state instead, if it makes more sense.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just making sure I was correct, if we already have the parent then it's fine to pass it in, but probably this function makes sense as just starting from the proposed_leaf

Comment on lines 972 to 974
if parent_leaf.height() == 0 {
return true;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just check if proposed_leaf.justify_qc is the gensis qc?

Comment on lines 383 to 403
let is_vote_leaf_extended = self
.consensus
.read()
.await
.is_leaf_extended(vote.data.leaf_commit);
let transmit_type = if is_vote_leaf_extended {
tracing::debug!("We're voting for the last eQC proposal. Broadcast the vote.");
TransmitType::Broadcast
} else {
let leader = match self.quorum_membership.leader(view_number, self.epoch) {
Ok(l) => l,
Err(e) => {
tracing::warn!(
"Failed to calculate leader for view number {:?}. Error: {:?}",
view_number,
e
);
return None;
}
};
TransmitType::Direct(leader)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this logic is more appropriate in the quorum_vote task, maybe just add the ExtendedVoteSend and handle that event here instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good idea. Done, thank you.

// The proposal is safe if
// 1. the proposed block and the justify QC block belong to the same epoch or
// 2. the justify QC is the eQC for the previous block
let proposal_epoch =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also need all this logic in the validate_proposal_liveness path, I suppose we can't actually do it there because we are missing the previous proposal but don't we need something to check the epochs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we cannot check the epochs there. We can only calculate the epoch based on the block number. I don't think we have an access to the parent's block number if we don't have the parent leaf. Did I miss something?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, I was just trying to be sure we didn't. I also don't think we can do the check in the liveness path, and we won't vote in that case either

Copy link
Contributor

@ss-es ss-es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some minor things that stuck out to me

@@ -399,6 +399,16 @@ impl<
TransmitType::Direct(leader),
))
}
HotShotEvent::ExtendedQuorumVoteSend(vote) => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a reason we don't want to just use QuorumVoteSend?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I've followed @bfish713 's comment here. I agree that the logic belongs more to the voting part than the networking part.

@@ -298,6 +298,7 @@ pub(crate) async fn submit_vote<TYPES: NodeType, I: NodeImplementation<TYPES>, V
storage: Arc<RwLock<I::Storage>>,
leaf: Leaf<TYPES>,
vid_share: Proposal<TYPES, VidDisperseShare<TYPES>>,
extended_vote: bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand the reason for this either -- are we planning for the submit_vote to actually be different for an eqc vs a regular qc? it seems like the only difference is we create a different internal hotshot event (which gets broadcast in exactly the same way)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only difference is that the last vote in 3-chain is broadcast instead of just sent to the next leader. To differentiate between the last vote in the 3-chain against any other vote, we need to check the two leaves up the chain. That logic seems to belong to the voting part instead of networking part. We now use the new ExtendedQuorumVoteSend event to differentiate on the networking part.

Base automatically changed from lr/increment-epoch to main November 13, 2024 18:13
@lukaszrzasik lukaszrzasik merged commit 32e6970 into main Nov 13, 2024
16 checks passed
@lukaszrzasik lukaszrzasik deleted the lr/eqc-voting branch November 13, 2024 19:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[EPOCH TRANSITION] - Implement beginning of a new epoch
6 participants