-
Notifications
You must be signed in to change notification settings - Fork 1.1k
net/peerset: Optimize substream opening duration for SetReservedPeers
#10362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
+252
−179
Merged
Changes from all commits
Commits
Show all changes
16 commits
Select commit
Hold shift + click to select a range
2573662
peerset: Change the peerset command from enum to struct
lexnv 2b7bf1b
notification: Use the command as struct for concurent open / close
lexnv 607e5fe
peerset: Transition to struct commands
lexnv ea021c4
notification: Adjust fuzzer
lexnv e2df699
notification/tests: Adjust testing to the new interface
lexnv c03a83b
notification: Fix fuzzer commands
lexnv 2c031c5
peerset: Move connection to reserved peers to a dedicated fn
lexnv 6ce6043
peerset: Connect to reserved peers immediately on SetReservedPeers
lexnv 1d6cea5
peerset/tests: Adjust testing to double check slot timer is not needed
lexnv cdc262a
Update from github-actions[bot] running command 'prdoc --audience nod…
github-actions[bot] 074fd2b
Update substrate/client/network/src/litep2p/shim/notification/peerset.rs
lexnv a1a26d6
Update substrate/client/network/src/litep2p/shim/notification/peerset.rs
lexnv 57fdeae
notification: Downgrade open/close logs to trace
lexnv dabd2a9
peerset: Iterate over reserved peers only
lexnv 26b96a2
peerset: Check first if peers are disconnected
lexnv 0c5dd37
peerset: Remove tracing logs
lexnv File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,27 @@ | ||
| title: 'net/peerset: Optimize substream opening duration for `SetReservedPeers`' | ||
| doc: | ||
| - audience: Node Dev | ||
| description: |- | ||
| While triaging the Versi-net, I've discovered that the connection between collators and validators sometimes takes less than 20ms, while at other times it takes more than 500ms. | ||
|
|
||
| In both cases, the validators are already connected to a different protocol. Therefore, opening and negotiating substreams must be almost instant. | ||
|
|
||
| The slot timer of the peerset artificially introduces the delay: | ||
| - The `SetReservedPeers` is received by the peerset. At this step, the peerset propagated the `closedSubstream` to signal that it wants to disconnect previously reserved peers. | ||
| - At the next slot allocation timer tick (after 1s), the newly added reserved peers are requested to be connected | ||
|
|
||
| This can introduce an artificial delay of up to 1s, which is unnecessary. | ||
|
|
||
| To mitigate this behavior, this PR: | ||
| - Transforms the ` enum PeersetNotificationCommand` into a structure. Effectively, the peerset can specify directly to close some substreams and open other substreams | ||
| - Upon receiving the `SetReservedPeers` command, peers are moved into the `Opening` state and the request is propagated to the litep2p to open substreams. | ||
| - The behavior of the slot allocation timer remains identical. This is needed to capture the following edge cases: | ||
| - The reserved peer of the `SetReservedPeers` is not disconnected, but backoff / pending closing. | ||
| - The reserved peer is banned | ||
|
|
||
| cc @paritytech/networking | ||
|
|
||
| Detected during versi-net triaging of elastic scaling: https://github.com/paritytech/polkadot-sdk/issues/10310#issuecomment-3543395157 | ||
| crates: | ||
| - name: sc-network | ||
| bump: patch |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
connect_reserved_peersgoes over all peers (not only reserved) to find disconnected reserved peers. We can handle only newly added reserved peers here like it's done withreserved_peers_maybe_remove, and let old reserved peers that got disconnected to be handled later on slot allocation. This should optimize things for the case when the list of reserved peers is updated many times per second.This is minor though, feel free to ignore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally makes sense, I've copy pasted the code without thinking too much about it: