Skip to content

Conversation

@lexnv
Copy link
Contributor

@lexnv lexnv commented Nov 19, 2025

While triaging the Versi-net, I've discovered that the connection between collators and validators sometimes takes less than 20ms, while at other times it takes more than 500ms.

In both cases, the validators are already connected to a different protocol. Therefore, opening and negotiating substreams must be almost instant.

The slot timer of the peerset artificially introduces the delay:

  • The SetReservedPeers is received by the peerset. At this step, the peerset propagated the closedSubstream to signal that it wants to disconnect previously reserved peers.
  • At the next slot allocation timer tick (after 1s), the newly added reserved peers are requested to be connected

This can introduce an artificial delay of up to 1s, which is unnecessary.

To mitigate this behavior, this PR:

  • Transforms the enum PeersetNotificationCommand into a structure. Effectively, the peerset can specify directly to close some substreams and open other substreams
  • Upon receiving the SetReservedPeers command, peers are moved into the Opening state and the request is propagated to the litep2p to open substreams.
  • The behavior of the slot allocation timer remains identical. This is needed to capture the following edge cases:
    • The reserved peer of the SetReservedPeers is not disconnected, but backoff / pending closing.
    • The reserved peer is banned

cc @paritytech/networking

Detected during versi-net triaging of elastic scaling: #10310 (comment)

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
@lexnv lexnv self-assigned this Nov 19, 2025
@lexnv lexnv added T0-node This PR/Issue is related to the topic “node”. I9-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task. labels Nov 19, 2025
@lexnv
Copy link
Contributor Author

lexnv commented Nov 19, 2025

/cmd prdoc --audience node_dev --bump patch

Comment on lines 177 to 186
log::debug!(target: LOG_TARGET, "{}: open substreams to {:?}", self.protocol, command.open_peers);
let _ = self
.handle
.open_substream_batch(command.open_peers.into_iter().map(From::from))
.await;
}

self.handle.close_substream_batch(peers.into_iter().map(From::from)).await;
},
if !command.close_peers.is_empty() {
log::debug!(target: LOG_TARGET, "{}: close substreams to {:?}", self.protocol, command.close_peers);
self.handle
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
log::debug!(target: LOG_TARGET, "{}: open substreams to {:?}", self.protocol, command.open_peers);
let _ = self
.handle
.open_substream_batch(command.open_peers.into_iter().map(From::from))
.await;
}
self.handle.close_substream_batch(peers.into_iter().map(From::from)).await;
},
if !command.close_peers.is_empty() {
log::debug!(target: LOG_TARGET, "{}: close substreams to {:?}", self.protocol, command.close_peers);
self.handle
log::trace!(target: LOG_TARGET, "{}: open substreams to {:?}", self.protocol, command.open_peers);
let _ = self
.handle
.open_substream_batch(command.open_peers.into_iter().map(From::from))
.await;
}
if !command.close_peers.is_empty() {
log::trace!(target: LOG_TARGET, "{}: close substreams to {:?}", self.protocol, command.close_peers);
self.handle

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, have downgraded the logs to trace since git doesn't want to apply the patch 🙏

// Open substreams to the new reserved peers that are disconnected.
// This ensures we are not relying on the slot allocation timer to connect to
// the new reserved peers. Therefore, we start connecting to them immediately.
let connect_to = self.connect_reserved_peers();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: connect_reserved_peers goes over all peers (not only reserved) to find disconnected reserved peers. We can handle only newly added reserved peers here like it's done with reserved_peers_maybe_remove, and let old reserved peers that got disconnected to be handled later on slot allocation. This should optimize things for the case when the list of reserved peers is updated many times per second.

This is minor though, feel free to ignore.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Totally makes sense, I've copy pasted the code without thinking too much about it:

  • now we are only iterating over the provided reserved peers
  • in a single iteration I'm also changing the state to opening, since there's no real reason why we'd have to pay for a second pass
  • slightly adjusted the suggestion and picked iterating through all reserved peers since they aren't expected to be large, and wanted to "connect" as fast as possible

lexnv and others added 4 commits November 20, 2025 13:10
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
@lexnv lexnv added this pull request to the merge queue Nov 24, 2025
Merged via the queue into master with commit 8a034ca Nov 24, 2025
256 of 262 checks passed
@lexnv lexnv deleted the lexnv/peerset branch November 24, 2025 11:02
@lexnv lexnv added A4-backport-stable2509 Pull request must be backported to the stable2509 release branch A4-backport-stable2512 Pull request must be backported to the stable2512 release branch labels Nov 25, 2025
paritytech-release-backport-bot bot pushed a commit that referenced this pull request Nov 25, 2025
…s` (#10362)

While triaging the Versi-net, I've discovered that the connection
between collators and validators sometimes takes less than 20ms, while
at other times it takes more than 500ms.

In both cases, the validators are already connected to a different
protocol. Therefore, opening and negotiating substreams must be almost
instant.

The slot timer of the peerset artificially introduces the delay:
- The `SetReservedPeers` is received by the peerset. At this step, the
peerset propagated the `closedSubstream` to signal that it wants to
disconnect previously reserved peers.
- At the next slot allocation timer tick (after 1s), the newly added
reserved peers are requested to be connected

This can introduce an artificial delay of up to 1s, which is
unnecessary.

To mitigate this behavior, this PR:
- Transforms the ` enum PeersetNotificationCommand` into a structure.
Effectively, the peerset can specify directly to close some substreams
and open other substreams
- Upon receiving the `SetReservedPeers` command, peers are moved into
the `Opening` state and the request is propagated to the litep2p to open
substreams.
- The behavior of the slot allocation timer remains identical. This is
needed to capture the following edge cases:
- The reserved peer of the `SetReservedPeers` is not disconnected, but
backoff / pending closing.
  - The reserved peer is banned

cc @paritytech/networking

Detected during versi-net triaging of elastic scaling:
#10310 (comment)

---------

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Bastian Köcher <git@kchr.de>
(cherry picked from commit 8a034ca)
@paritytech-release-backport-bot

paritytech-release-backport-bot bot pushed a commit that referenced this pull request Nov 25, 2025
…s` (#10362)

While triaging the Versi-net, I've discovered that the connection
between collators and validators sometimes takes less than 20ms, while
at other times it takes more than 500ms.

In both cases, the validators are already connected to a different
protocol. Therefore, opening and negotiating substreams must be almost
instant.

The slot timer of the peerset artificially introduces the delay:
- The `SetReservedPeers` is received by the peerset. At this step, the
peerset propagated the `closedSubstream` to signal that it wants to
disconnect previously reserved peers.
- At the next slot allocation timer tick (after 1s), the newly added
reserved peers are requested to be connected

This can introduce an artificial delay of up to 1s, which is
unnecessary.

To mitigate this behavior, this PR:
- Transforms the ` enum PeersetNotificationCommand` into a structure.
Effectively, the peerset can specify directly to close some substreams
and open other substreams
- Upon receiving the `SetReservedPeers` command, peers are moved into
the `Opening` state and the request is propagated to the litep2p to open
substreams.
- The behavior of the slot allocation timer remains identical. This is
needed to capture the following edge cases:
- The reserved peer of the `SetReservedPeers` is not disconnected, but
backoff / pending closing.
  - The reserved peer is banned

cc @paritytech/networking

Detected during versi-net triaging of elastic scaling:
#10310 (comment)

---------

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Bastian Köcher <git@kchr.de>
(cherry picked from commit 8a034ca)
@paritytech-release-backport-bot

EgorPopelyaev added a commit that referenced this pull request Nov 25, 2025
… duration for `SetReservedPeers` (#10408)

Backport #10362 into `stable2509` from lexnv.

See the
[documentation](https://github.com/paritytech/polkadot-sdk/blob/master/docs/BACKPORT.md)
on how to use this bot.

<!--
  # To be used by other automation, do not modify:
  original-pr-number: #${pull_number}
-->

---------

Signed-off-by: Alexandru Vasile <alexandru.vasile@parity.io>
Co-authored-by: Alexandru Vasile <60601340+lexnv@users.noreply.github.com>
Co-authored-by: cmd[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Bastian Köcher <git@kchr.de>
Co-authored-by: Egor_P <egor@parity.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A4-backport-stable2509 Pull request must be backported to the stable2509 release branch A4-backport-stable2512 Pull request must be backported to the stable2512 release branch I9-optimisation An enhancement to provide better overall performance in terms of time-to-completion for a task. T0-node This PR/Issue is related to the topic “node”.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants