-
Notifications
You must be signed in to change notification settings - Fork 1.1k
net/peerset: Optimize substream opening duration for SetReservedPeers
#10362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
|
/cmd prdoc --audience node_dev --bump patch |
…e_dev --bump patch'
| log::debug!(target: LOG_TARGET, "{}: open substreams to {:?}", self.protocol, command.open_peers); | ||
| let _ = self | ||
| .handle | ||
| .open_substream_batch(command.open_peers.into_iter().map(From::from)) | ||
| .await; | ||
| } | ||
|
|
||
| self.handle.close_substream_batch(peers.into_iter().map(From::from)).await; | ||
| }, | ||
| if !command.close_peers.is_empty() { | ||
| log::debug!(target: LOG_TARGET, "{}: close substreams to {:?}", self.protocol, command.close_peers); | ||
| self.handle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| log::debug!(target: LOG_TARGET, "{}: open substreams to {:?}", self.protocol, command.open_peers); | |
| let _ = self | |
| .handle | |
| .open_substream_batch(command.open_peers.into_iter().map(From::from)) | |
| .await; | |
| } | |
| self.handle.close_substream_batch(peers.into_iter().map(From::from)).await; | |
| }, | |
| if !command.close_peers.is_empty() { | |
| log::debug!(target: LOG_TARGET, "{}: close substreams to {:?}", self.protocol, command.close_peers); | |
| self.handle | |
| log::trace!(target: LOG_TARGET, "{}: open substreams to {:?}", self.protocol, command.open_peers); | |
| let _ = self | |
| .handle | |
| .open_substream_batch(command.open_peers.into_iter().map(From::from)) | |
| .await; | |
| } | |
| if !command.close_peers.is_empty() { | |
| log::trace!(target: LOG_TARGET, "{}: close substreams to {:?}", self.protocol, command.close_peers); | |
| self.handle |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, have downgraded the logs to trace since git doesn't want to apply the patch 🙏
substrate/client/network/src/litep2p/shim/notification/peerset.rs
Outdated
Show resolved
Hide resolved
substrate/client/network/src/litep2p/shim/notification/peerset.rs
Outdated
Show resolved
Hide resolved
| // Open substreams to the new reserved peers that are disconnected. | ||
| // This ensures we are not relying on the slot allocation timer to connect to | ||
| // the new reserved peers. Therefore, we start connecting to them immediately. | ||
| let connect_to = self.connect_reserved_peers(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: connect_reserved_peers goes over all peers (not only reserved) to find disconnected reserved peers. We can handle only newly added reserved peers here like it's done with reserved_peers_maybe_remove, and let old reserved peers that got disconnected to be handled later on slot allocation. This should optimize things for the case when the list of reserved peers is updated many times per second.
This is minor though, feel free to ignore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally makes sense, I've copy pasted the code without thinking too much about it:
- now we are only iterating over the provided reserved peers
- in a single iteration I'm also changing the state to opening, since there's no real reason why we'd have to pay for a second pass
- slightly adjusted the suggestion and picked iterating through all reserved peers since they aren't expected to be large, and wanted to "connect" as fast as possible
Co-authored-by: Bastian Köcher <git@kchr.de>
Co-authored-by: Bastian Köcher <git@kchr.de>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
substrate/client/network/src/litep2p/shim/notification/peerset.rs
Outdated
Show resolved
Hide resolved
substrate/client/network/src/litep2p/shim/notification/peerset.rs
Outdated
Show resolved
Hide resolved
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
…s` (#10362) While triaging the Versi-net, I've discovered that the connection between collators and validators sometimes takes less than 20ms, while at other times it takes more than 500ms. In both cases, the validators are already connected to a different protocol. Therefore, opening and negotiating substreams must be almost instant. The slot timer of the peerset artificially introduces the delay: - The `SetReservedPeers` is received by the peerset. At this step, the peerset propagated the `closedSubstream` to signal that it wants to disconnect previously reserved peers. - At the next slot allocation timer tick (after 1s), the newly added reserved peers are requested to be connected This can introduce an artificial delay of up to 1s, which is unnecessary. To mitigate this behavior, this PR: - Transforms the ` enum PeersetNotificationCommand` into a structure. Effectively, the peerset can specify directly to close some substreams and open other substreams - Upon receiving the `SetReservedPeers` command, peers are moved into the `Opening` state and the request is propagated to the litep2p to open substreams. - The behavior of the slot allocation timer remains identical. This is needed to capture the following edge cases: - The reserved peer of the `SetReservedPeers` is not disconnected, but backoff / pending closing. - The reserved peer is banned cc @paritytech/networking Detected during versi-net triaging of elastic scaling: #10310 (comment) --------- Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io> Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Bastian Köcher <git@kchr.de> (cherry picked from commit 8a034ca)
|
Successfully created backport PR for |
…s` (#10362) While triaging the Versi-net, I've discovered that the connection between collators and validators sometimes takes less than 20ms, while at other times it takes more than 500ms. In both cases, the validators are already connected to a different protocol. Therefore, opening and negotiating substreams must be almost instant. The slot timer of the peerset artificially introduces the delay: - The `SetReservedPeers` is received by the peerset. At this step, the peerset propagated the `closedSubstream` to signal that it wants to disconnect previously reserved peers. - At the next slot allocation timer tick (after 1s), the newly added reserved peers are requested to be connected This can introduce an artificial delay of up to 1s, which is unnecessary. To mitigate this behavior, this PR: - Transforms the ` enum PeersetNotificationCommand` into a structure. Effectively, the peerset can specify directly to close some substreams and open other substreams - Upon receiving the `SetReservedPeers` command, peers are moved into the `Opening` state and the request is propagated to the litep2p to open substreams. - The behavior of the slot allocation timer remains identical. This is needed to capture the following edge cases: - The reserved peer of the `SetReservedPeers` is not disconnected, but backoff / pending closing. - The reserved peer is banned cc @paritytech/networking Detected during versi-net triaging of elastic scaling: #10310 (comment) --------- Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io> Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Bastian Köcher <git@kchr.de> (cherry picked from commit 8a034ca)
|
Successfully created backport PR for |
… duration for `SetReservedPeers` (#10408) Backport #10362 into `stable2509` from lexnv. See the [documentation](https://github.com/paritytech/polkadot-sdk/blob/master/docs/BACKPORT.md) on how to use this bot. <!-- # To be used by other automation, do not modify: original-pr-number: #${pull_number} --> --------- Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io> Co-authored-by: Alexandru Vasile <60601340+lexnv@users.noreply.github.com> Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Bastian Köcher <git@kchr.de> Co-authored-by: Egor_P <egor@parity.io>
While triaging the Versi-net, I've discovered that the connection between collators and validators sometimes takes less than 20ms, while at other times it takes more than 500ms.
In both cases, the validators are already connected to a different protocol. Therefore, opening and negotiating substreams must be almost instant.
The slot timer of the peerset artificially introduces the delay:
SetReservedPeersis received by the peerset. At this step, the peerset propagated theclosedSubstreamto signal that it wants to disconnect previously reserved peers.This can introduce an artificial delay of up to 1s, which is unnecessary.
To mitigate this behavior, this PR:
enum PeersetNotificationCommandinto a structure. Effectively, the peerset can specify directly to close some substreams and open other substreamsSetReservedPeerscommand, peers are moved into theOpeningstate and the request is propagated to the litep2p to open substreams.SetReservedPeersis not disconnected, but backoff / pending closing.cc @paritytech/networking
Detected during versi-net triaging of elastic scaling: #10310 (comment)