Skip to content

Commit

Permalink
Introduce approval-voting/distribution benchmark (#2621)
Browse files Browse the repository at this point in the history
## Summary
Built on top of the tooling and ideas introduced in
#2528, this PR introduces
a synthetic benchmark for measuring and assessing the performance
characteristics of the approval-voting and approval-distribution
subsystems.

Currently this allows, us to simulate the behaviours of these systems
based on the following dimensions:
```
TestConfiguration:
# Test 1
- objective: !ApprovalsTest
    last_considered_tranche: 89
    min_coalesce: 1
    max_coalesce: 6
    enable_assignments_v2: true
    send_till_tranche: 60
    stop_when_approved: false
    coalesce_tranche_diff: 12
    workdir_prefix: "/tmp"
    num_no_shows_per_candidate: 0
    approval_distribution_expected_tof: 6.0
    approval_distribution_cpu_ms: 3.0
    approval_voting_cpu_ms: 4.30
  n_validators: 500
  n_cores: 100
  n_included_candidates: 100
  min_pov_size: 1120
  max_pov_size: 5120
  peer_bandwidth: 524288000000
  bandwidth: 524288000000
  latency:
    min_latency:
      secs: 0
      nanos: 1000000
    max_latency:
      secs: 0
      nanos: 100000000
  error: 0
  num_blocks: 10
```

## The approach
1. We build a real overseer with the real implementations for
approval-voting and approval-distribution subsystems.
2. For a given network size, for each validator we pre-computed all
potential assignments and approvals it would send, because this a
computation heavy operation this will be cached on a file on disk and be
re-used if the generation parameters don't change.
3. The messages will be sent accordingly to the configured parameters
and those are split into 3 main benchmarking scenarios.

## Benchmarking scenarios

### Best case scenario *approvals_throughput_best_case.yaml*
It send to the approval-distribution only the minimum required tranche
to gathered the needed_approvals, so that a candidate is approved.

### Behaviour in the presence of no-shows *approvals_no_shows.yaml*
It sends the tranche needed to approve a candidate when we have a
maximum of *num_no_shows_per_candidate* tranches with no-shows for each
candidate.

### Maximum throughput *approvals_throughput.yaml*
It sends all the tranches for each block and measures the used CPU and
necessary network bandwidth. by the approval-voting and
approval-distribution subsystem.

## How to run it
```
cargo run -p polkadot-subsystem-bench --release -- test-sequence --path polkadot/node/subsystem-bench/examples/approvals_throughput.yaml
```

## Evaluating performance
### Use the real subsystems metrics
If you follow the steps in
https://github.com/paritytech/polkadot-sdk/tree/master/polkadot/node/subsystem-bench#install-grafana
for installing locally prometheus and grafana, all real metrics for the
`approval-distribution`, `approval-voting` and overseer are available.
E.g:
<img width="2149" alt="Screenshot 2023-12-05 at 11 07 46"
src="https://github.com/paritytech/polkadot-sdk/assets/49718502/cb8ae2dd-178b-4922-bfa4-dc37e572ed38">

<img width="2551" alt="Screenshot 2023-12-05 at 11 09 42"
src="https://github.com/paritytech/polkadot-sdk/assets/49718502/8b4542ba-88b9-46f9-9b70-cc345366081b">

<img width="2154" alt="Screenshot 2023-12-05 at 11 10 15"
src="https://github.com/paritytech/polkadot-sdk/assets/49718502/b8874d8d-632e-443a-9840-14ad8e90c54f">

<img width="2535" alt="Screenshot 2023-12-05 at 11 10 52"
src="https://github.com/paritytech/polkadot-sdk/assets/49718502/779a439f-fd18-4985-bb80-85d5afad78e2">

### Profile with pyroscope
1. Setup pyroscope following the steps in
https://github.com/paritytech/polkadot-sdk/tree/master/polkadot/node/subsystem-bench#install-pyroscope,
then run any of the benchmark scenario with `--profile` as the
arguments.
2. Open the pyroscope dashboard in grafana, e.g:
<img width="2544" alt="Screenshot 2024-01-09 at 17 09 58"
src="https://github.com/paritytech/polkadot-sdk/assets/49718502/58f50c99-a910-4d20-951a-8b16639303d9">



### Useful  logs
1. Network bandwidth requirements:
```
Payload bytes received from peers: 503993 KiB total, 50399 KiB/block
Payload bytes sent to peers: 629971 KiB total, 62997 KiB/block
```

2. Cpu usage by the approval-distribution/approval-voting subsystems.
```
approval-distribution CPU usage 84.061s
approval-distribution CPU usage per block 8.406s
approval-voting CPU usage 96.532s
approval-voting CPU usage per block 9.653s
```

3. Time passed until a given block is approved
```
 Chain selection approved  after 3500 ms hash=0x0101010101010101010101010101010101010101010101010101010101010101
Chain selection approved  after 4500 ms hash=0x0202020202020202020202020202020202020202020202020202020202020202
```

### Using benchmark to quantify improvements from
#1178 +
#1191

Using a versi-node we compare the scenarios where all new optimisations
are disabled with a scenarios where tranche0 assignments are sent in a
single message and a conservative simulation where the coalescing of
approvals gives us just 50% reduction in the number of messages we send.

Overall, what we see is a speedup of around 30-40% in the time it takes
to process the necessary messages and a 30-40% reduction in the
necessary bandwidth.

#### Best case scenario comparison(minimum required tranches sent).
Unoptimised
```
    Number of blocks: 10
    Payload bytes received from peers: 53289 KiB total, 5328 KiB/block
    Payload bytes sent to peers: 52489 KiB total, 5248 KiB/block
    approval-distribution CPU usage 6.732s
    approval-distribution CPU usage per block 0.673s
    approval-voting CPU usage 9.523s
    approval-voting CPU usage per block 0.952s
```

vs Optimisation enabled
```
   Number of blocks: 10
   Payload bytes received from peers: 32141 KiB total, 3214 KiB/block
   Payload bytes sent to peers: 37314 KiB total, 3731 KiB/block
   approval-distribution CPU usage 4.658s
   approval-distribution CPU usage per block 0.466s
   approval-voting CPU usage 6.236s
   approval-voting CPU usage per block 0.624s
```

#### Worst case all tranches sent, very unlikely happens when sharding
breaks.

Unoptimised
```
   Number of blocks: 10
   Payload bytes received from peers: 746393 KiB total, 74639 KiB/block
   Payload bytes sent to peers: 729151 KiB total, 72915 KiB/block
   approval-distribution CPU usage 118.681s
   approval-distribution CPU usage per block 11.868s
   approval-voting CPU usage 124.118s
   approval-voting CPU usage per block 12.412s
```

vs optimised
```
    Number of blocks: 10
    Payload bytes received from peers: 503993 KiB total, 50399 KiB/block
    Payload bytes sent to peers: 629971 KiB total, 62997 KiB/block
    approval-distribution CPU usage 84.061s
    approval-distribution CPU usage per block 8.406s
    approval-voting CPU usage 96.532s
    approval-voting CPU usage per block 9.653s
```


## TODOs
[x] Polish implementation.
[x] Use what we have so far to evaluate
#1191 before merging.
[x] List of features and additional dimensions we want to use for
benchmarking.
[x] Run benchmark on hardware similar with versi and kusama nodes.
[ ] Add benchmark to be run in CI for catching regression in
performance.
[ ] Rebase on latest changes for network emulation.

---------

Signed-off-by: Andrei Sandu <andrei-mihail@parity.io>
Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io>
Co-authored-by: Andrei Sandu <andrei-mihail@parity.io>
Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>
  • Loading branch information
3 people committed Feb 5, 2024
1 parent 90849b6 commit f9f8868
Show file tree
Hide file tree
Showing 29 changed files with 2,857 additions and 127 deletions.
15 changes: 13 additions & 2 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 4 additions & 4 deletions polkadot/node/core/approval-voting/src/criteria.rs
Original file line number Diff line number Diff line change
Expand Up @@ -55,11 +55,11 @@ pub struct OurAssignment {
}

impl OurAssignment {
pub(crate) fn cert(&self) -> &AssignmentCertV2 {
pub fn cert(&self) -> &AssignmentCertV2 {
&self.cert
}

pub(crate) fn tranche(&self) -> DelayTranche {
pub fn tranche(&self) -> DelayTranche {
self.tranche
}

Expand Down Expand Up @@ -225,7 +225,7 @@ fn assigned_core_transcript(core_index: CoreIndex) -> Transcript {

/// Information about the world assignments are being produced in.
#[derive(Clone, Debug)]
pub(crate) struct Config {
pub struct Config {
/// The assignment public keys for validators.
assignment_keys: Vec<AssignmentId>,
/// The groups of validators assigned to each core.
Expand Down Expand Up @@ -321,7 +321,7 @@ impl AssignmentCriteria for RealAssignmentCriteria {
/// different times. The idea is that most assignments are never triggered and fall by the wayside.
///
/// This will not assign to anything the local validator was part of the backing group for.
pub(crate) fn compute_assignments(
pub fn compute_assignments(
keystore: &LocalKeystore,
relay_vrf_story: RelayVRFStory,
config: &Config,
Expand Down
41 changes: 28 additions & 13 deletions polkadot/node/core/approval-voting/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -92,11 +92,11 @@ use time::{slot_number_to_tick, Clock, ClockExt, DelayedApprovalTimer, SystemClo
mod approval_checking;
pub mod approval_db;
mod backend;
mod criteria;
pub mod criteria;
mod import;
mod ops;
mod persisted_entries;
mod time;
pub mod time;

use crate::{
approval_checking::{Check, TranchesToApproveResult},
Expand Down Expand Up @@ -159,6 +159,7 @@ pub struct ApprovalVotingSubsystem {
db: Arc<dyn Database>,
mode: Mode,
metrics: Metrics,
clock: Box<dyn Clock + Send + Sync>,
}

#[derive(Clone)]
Expand Down Expand Up @@ -444,6 +445,25 @@ impl ApprovalVotingSubsystem {
keystore: Arc<LocalKeystore>,
sync_oracle: Box<dyn SyncOracle + Send>,
metrics: Metrics,
) -> Self {
ApprovalVotingSubsystem::with_config_and_clock(
config,
db,
keystore,
sync_oracle,
metrics,
Box::new(SystemClock {}),
)
}

/// Create a new approval voting subsystem with the given keystore, config, and database.
pub fn with_config_and_clock(
config: Config,
db: Arc<dyn Database>,
keystore: Arc<LocalKeystore>,
sync_oracle: Box<dyn SyncOracle + Send>,
metrics: Metrics,
clock: Box<dyn Clock + Send + Sync>,
) -> Self {
ApprovalVotingSubsystem {
keystore,
Expand All @@ -452,6 +472,7 @@ impl ApprovalVotingSubsystem {
db_config: DatabaseConfig { col_approval_data: config.col_approval_data },
mode: Mode::Syncing(sync_oracle),
metrics,
clock,
}
}

Expand Down Expand Up @@ -493,15 +514,10 @@ fn db_sanity_check(db: Arc<dyn Database>, config: DatabaseConfig) -> SubsystemRe
impl<Context: Send> ApprovalVotingSubsystem {
fn start(self, ctx: Context) -> SpawnedSubsystem {
let backend = DbBackend::new(self.db.clone(), self.db_config);
let future = run::<DbBackend, Context>(
ctx,
self,
Box::new(SystemClock),
Box::new(RealAssignmentCriteria),
backend,
)
.map_err(|e| SubsystemError::with_origin("approval-voting", e))
.boxed();
let future =
run::<DbBackend, Context>(ctx, self, Box::new(RealAssignmentCriteria), backend)
.map_err(|e| SubsystemError::with_origin("approval-voting", e))
.boxed();

SpawnedSubsystem { name: "approval-voting-subsystem", future }
}
Expand Down Expand Up @@ -909,7 +925,6 @@ enum Action {
async fn run<B, Context>(
mut ctx: Context,
mut subsystem: ApprovalVotingSubsystem,
clock: Box<dyn Clock + Send + Sync>,
assignment_criteria: Box<dyn AssignmentCriteria + Send + Sync>,
mut backend: B,
) -> SubsystemResult<()>
Expand All @@ -923,7 +938,7 @@ where
let mut state = State {
keystore: subsystem.keystore,
slot_duration_millis: subsystem.slot_duration_millis,
clock,
clock: subsystem.clock,
assignment_criteria,
spans: HashMap::new(),
};
Expand Down
4 changes: 2 additions & 2 deletions polkadot/node/core/approval-voting/src/tests.rs
Original file line number Diff line number Diff line change
Expand Up @@ -549,7 +549,7 @@ fn test_harness<T: Future<Output = VirtualOverseer>>(

let subsystem = run(
context,
ApprovalVotingSubsystem::with_config(
ApprovalVotingSubsystem::with_config_and_clock(
Config {
col_approval_data: test_constants::TEST_CONFIG.col_approval_data,
slot_duration_millis: SLOT_DURATION_MILLIS,
Expand All @@ -558,8 +558,8 @@ fn test_harness<T: Future<Output = VirtualOverseer>>(
Arc::new(keystore),
sync_oracle,
Metrics::default(),
clock.clone(),
),
clock.clone(),
assignment_criteria,
backend,
);
Expand Down
24 changes: 18 additions & 6 deletions polkadot/node/core/approval-voting/src/time.rs
Original file line number Diff line number Diff line change
Expand Up @@ -33,14 +33,14 @@ use std::{
};

use polkadot_primitives::{Hash, ValidatorIndex};
const TICK_DURATION_MILLIS: u64 = 500;
pub const TICK_DURATION_MILLIS: u64 = 500;

/// A base unit of time, starting from the Unix epoch, split into half-second intervals.
pub(crate) type Tick = u64;
pub type Tick = u64;

/// A clock which allows querying of the current tick as well as
/// waiting for a tick to be reached.
pub(crate) trait Clock {
pub trait Clock {
/// Yields the current tick.
fn tick_now(&self) -> Tick;

Expand All @@ -49,7 +49,7 @@ pub(crate) trait Clock {
}

/// Extension methods for clocks.
pub(crate) trait ClockExt {
pub trait ClockExt {
fn tranche_now(&self, slot_duration_millis: u64, base_slot: Slot) -> DelayTranche;
}

Expand All @@ -61,7 +61,8 @@ impl<C: Clock + ?Sized> ClockExt for C {
}

/// A clock which uses the actual underlying system clock.
pub(crate) struct SystemClock;
#[derive(Clone)]
pub struct SystemClock;

impl Clock for SystemClock {
/// Yields the current tick.
Expand Down Expand Up @@ -93,11 +94,22 @@ fn tick_to_time(tick: Tick) -> SystemTime {
}

/// assumes `slot_duration_millis` evenly divided by tick duration.
pub(crate) fn slot_number_to_tick(slot_duration_millis: u64, slot: Slot) -> Tick {
pub fn slot_number_to_tick(slot_duration_millis: u64, slot: Slot) -> Tick {
let ticks_per_slot = slot_duration_millis / TICK_DURATION_MILLIS;
u64::from(slot) * ticks_per_slot
}

/// Converts a tick to the slot number.
pub fn tick_to_slot_number(slot_duration_millis: u64, tick: Tick) -> Slot {
let ticks_per_slot = slot_duration_millis / TICK_DURATION_MILLIS;
(tick / ticks_per_slot).into()
}

/// Converts a tranche from a slot to the tick number.
pub fn tranche_to_tick(slot_duration_millis: u64, slot: Slot, tranche: u32) -> Tick {
slot_number_to_tick(slot_duration_millis, slot) + tranche as u64
}

/// A list of delayed futures that gets triggered when the waiting time has expired and it is
/// time to sign the candidate.
/// We have a timer per relay-chain block.
Expand Down
13 changes: 13 additions & 0 deletions polkadot/node/subsystem-bench/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,9 @@ sp-core = { path = "../../../substrate/primitives/core" }
clap = { version = "4.4.18", features = ["derive"] }
futures = "0.3.21"
futures-timer = "3.0.2"
bincode = "1.3.3"
sha1 = "0.10.6"
hex = "0.4.3"
gum = { package = "tracing-gum", path = "../gum" }
polkadot-erasure-coding = { package = "polkadot-erasure-coding", path = "../../erasure-coding" }
log = "0.4.17"
Expand All @@ -64,6 +67,16 @@ prometheus_endpoint = { package = "substrate-prometheus-endpoint", path = "../..
prometheus = { version = "0.13.0", default-features = false }
serde = "1.0.195"
serde_yaml = "0.9"

polkadot-node-core-approval-voting = { path = "../core/approval-voting" }
polkadot-approval-distribution = { path = "../network/approval-distribution" }
sp-consensus-babe = { path = "../../../substrate/primitives/consensus/babe" }
sp-runtime = { path = "../../../substrate/primitives/runtime", default-features = false }
sp-timestamp = { path = "../../../substrate/primitives/timestamp" }

schnorrkel = { version = "0.9.1", default-features = false }
rand_core = "0.6.2" # should match schnorrkel
rand_chacha = { version = "0.3.1" }
paste = "1.0.14"
orchestra = { version = "0.3.5", default-features = false, features = ["futures_channel"] }
pyroscope = "0.5.7"
Expand Down
18 changes: 18 additions & 0 deletions polkadot/node/subsystem-bench/examples/approvals_no_shows.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
TestConfiguration:
# Test 1
- objective: !ApprovalVoting
last_considered_tranche: 89
coalesce_mean: 3.0
coalesce_std_dev: 1.0
stop_when_approved: true
coalesce_tranche_diff: 12
workdir_prefix: "/tmp/"
enable_assignments_v2: true
num_no_shows_per_candidate: 10
n_validators: 500
n_cores: 100
min_pov_size: 1120
max_pov_size: 5120
peer_bandwidth: 524288000000
bandwidth: 524288000000
num_blocks: 10
19 changes: 19 additions & 0 deletions polkadot/node/subsystem-bench/examples/approvals_throughput.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
TestConfiguration:
# Test 1
- objective: !ApprovalVoting
coalesce_mean: 3.0
coalesce_std_dev: 1.0
enable_assignments_v2: true
last_considered_tranche: 89
stop_when_approved: false
coalesce_tranche_diff: 12
workdir_prefix: "/tmp"
num_no_shows_per_candidate: 0
n_validators: 500
n_cores: 100
n_included_candidates: 100
min_pov_size: 1120
max_pov_size: 5120
peer_bandwidth: 524288000000
bandwidth: 524288000000
num_blocks: 10
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
TestConfiguration:
# Test 1
- objective: !ApprovalVoting
coalesce_mean: 3.0
coalesce_std_dev: 1.0
enable_assignments_v2: true
last_considered_tranche: 89
stop_when_approved: true
coalesce_tranche_diff: 12
workdir_prefix: "/tmp/"
num_no_shows_per_candidate: 0
n_validators: 500
n_cores: 100
min_pov_size: 1120
max_pov_size: 5120
peer_bandwidth: 524288000000
bandwidth: 524288000000
num_blocks: 10
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
TestConfiguration:
# Test 1
- objective: !ApprovalVoting
coalesce_mean: 1.0
coalesce_std_dev: 0.0
enable_assignments_v2: false
last_considered_tranche: 89
stop_when_approved: false
coalesce_tranche_diff: 12
workdir_prefix: "/tmp/"
num_no_shows_per_candidate: 0
n_validators: 500
n_cores: 100
min_pov_size: 1120
max_pov_size: 5120
peer_bandwidth: 524288000000
bandwidth: 524288000000
num_blocks: 10
Loading

0 comments on commit f9f8868

Please sign in to comment.