Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor deneb networking #4561

Merged

Conversation

realbigsean
Copy link
Member

Issue Addressed

The goals here are:

  1. Reduce code duplication between blocks and blobs - the logic here is very complicated so IMO reducing duplication is worth adding some generics/type complexity
  2. Increase type safety - in the first iteration of single lookups I ran into bugs accessing/mutating the wrong lookup state (block or blob) for the given response. This PR makes it a lot easier to avoid those types of mistakes. It also may lay the groundwork for giving us type safety across RPC requests and responses. 

Implementation

  • Made single block and blob lookups generic by introducing a RequestState trait. This trait has an associated type ResponseType that gives us some degree of type safety when deciding what state in the lookup to mutate for a given response.

  • I tried to consolidate some of the error handling, to make it easier to follow exactly where we drop requests, log, and peer score.

  • Updated RequestId, this is the new one:

pub enum RequestId {
    SingleBlock { id: SingleLookupReqId },
...
}

pub struct SingleLookupReqId {
    pub id: Id,
    pub req_counter: Id,
}

A pain point I ran into previously with blobs was around keying lookups based on an Id, which is generated per-request. Since a single lookup now really contains two request, each with independent retry logic, keying by a single id, that either request could update didn't really make sense. I found it difficult to work with and reason about.

In this PR I decided to make a Id map to an entire lookup, so shared across block and blobs, and unchanged for the life of the lookup. This led me to understand why updating the request Id on retry was previously implemented. When we are verifying a response, we don't wait for the stream terminator to peer score and retry, so it's possible we retry the request while we are still downloading responses for the old request, this could cause us to interpret the old response as the new one if we don't update the Id, or have some way of differentiating old and new. This is why I added req_counter, to SingleLookupReqId. The lookup remains keyed solely on the Id and shared between blocks and blobs, but the req_counter can be tracked separately for blocks and blobs and used to filter out old responses.

Not included, but maybe future

I went down a small rabbit hole making RequestId into a superstruct. This would let us send per-protocol types as Ids instead of the RequestId enum when we make any RPC request. In by-range requests, this then allows us to key all the maps in SyncNetworkContext by different types for example. In by-roots requests, we could add a RequestId associated type to RequestState.

@realbigsean realbigsean added ready-for-review The code is ready for review deneb labels Aug 1, 2023
Copy link
Member

@AgeManning AgeManning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mainly had a look at the new trait you have implemented.

I like it! I think its good to try and group a bunch of the logic into one place and this seems like a very nice way to do it.

There are a bunch of changes in the deneb branch which make it hard to review a lot of the other changes to the actual sync code.

Also, from experience, with these extensive changes to sync, it's very hard to reason about all the billion edge cases in sync during a code review. I found in the past the best way to find bugs is to battle test the #$!@ out of it.

I'd propose merging and doing all kinds of funky things on devnets/testnets and we can bombard it on attacknets, when we think it's ready.

Nice work tho, looks really good to me!

beacon_node/network/src/sync/block_lookups/common.rs Outdated Show resolved Hide resolved
@jimmygchen
Copy link
Member

I haven't finished reviewing yet, but the refactoring looks really good so far! 👏

@realbigsean
Copy link
Member Author

Thanks for the reviews guys, think I've addressed everything so far. Here's the diff:
https://github.com/sigp/lighthouse/compare/c71e011a1cc4482c2d31b0d37481e5b4b6dc2ac2..af5e0d455c3699de979729bb6746254b8e06894b

self.handle_verified_response::<Current, R>(
seen_timestamp,
cx,
BlockProcessType::SingleBlock { id: lookup.id },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do this need to be generic, or is this always a SingleBlock process type here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty confusing, but it should always be SingleBlock. The reason is that it's used in scenarios where we're waiting for the entire block + all blobs to arrive before sending to processing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could potentially make this less confusing by added a new variant to BlockProcessType that is specific to when we have all blocks and blobs

@jimmygchen
Copy link
Member

Added a few more comments / questions, overall looks great to me - I've gone through most of the changes I think, and don't fully understand everything yet, but keen to test this out on devnets!

Copy link
Member

@pawanjay176 pawanjay176 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like this new approach. I originally thought this would make the code more complex, but this is way better than before. Great work!
Like age said, will try to break this on testnets.

Copy link
Member

@jimmygchen jimmygchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@realbigsean realbigsean merged commit 731b7e7 into sigp:deneb-free-blobs Aug 7, 2023
27 of 28 checks passed
@realbigsean realbigsean deleted the refactor-deneb-networking branch August 7, 2023 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deneb ready-for-review The code is ready for review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants