Introduce approval-voting/distribution benchmark (#2621)

## Summary Built on top of the tooling and ideas introduced in #2528, this PR introduces a synthetic benchmark for measuring and assessing the performance characteristics of the approval-voting and approval-distribution subsystems. Currently this allows, us to simulate the behaviours of these systems based on the following dimensions: ``` TestConfiguration: # Test 1 - objective: !ApprovalsTest last_considered_tranche: 89 min_coalesce: 1 max_coalesce: 6 enable_assignments_v2: true send_till_tranche: 60 stop_when_approved: false coalesce_tranche_diff: 12 workdir_prefix: "/tmp" num_no_shows_per_candidate: 0 approval_distribution_expected_tof: 6.0 approval_distribution_cpu_ms: 3.0 approval_voting_cpu_ms: 4.30 n_validators: 500 n_cores: 100 n_included_candidates: 100 min_pov_size: 1120 max_pov_size: 5120 peer_bandwidth: 524288000000 bandwidth: 524288000000 latency: min_latency: secs: 0 nanos: 1000000 max_latency: secs: 0 nanos: 100000000 error: 0 num_blocks: 10 ``` ## The approach 1. We build a real overseer with the real implementations for approval-voting and approval-distribution subsystems. 2. For a given network size, for each validator we pre-computed all potential assignments and approvals it would send, because this a computation heavy operation this will be cached on a file on disk and be re-used if the generation parameters don't change. 3. The messages will be sent accordingly to the configured parameters and those are split into 3 main benchmarking scenarios. ## Benchmarking scenarios ### Best case scenario *approvals_throughput_best_case.yaml* It send to the approval-distribution only the minimum required tranche to gathered the needed_approvals, so that a candidate is approved. ### Behaviour in the presence of no-shows *approvals_no_shows.yaml* It sends the tranche needed to approve a candidate when we have a maximum of *num_no_shows_per_candidate* tranches with no-shows for each candidate. ### Maximum throughput *approvals_throughput.yaml* It sends all the tranches for each block and measures the used CPU and necessary network bandwidth. by the approval-voting and approval-distribution subsystem. ## How to run it ``` cargo run -p polkadot-subsystem-bench --release -- test-sequence --path polkadot/node/subsystem-bench/examples/approvals_throughput.yaml ``` ## Evaluating performance ### Use the real subsystems metrics If you follow the steps in https://github.com/paritytech/polkadot-sdk/tree/master/polkadot/node/subsystem-bench#install-grafana for installing locally prometheus and grafana, all real metrics for the `approval-distribution`, `approval-voting` and overseer are available. E.g: <img width="2149" alt="Screenshot 2023-12-05 at 11 07 46" src="https://github.com/paritytech/polkadot-sdk/assets/49718502/cb8ae2dd-178b-4922-bfa4-dc37e572ed38"> <img width="2551" alt="Screenshot 2023-12-05 at 11 09 42" src="https://github.com/paritytech/polkadot-sdk/assets/49718502/8b4542ba-88b9-46f9-9b70-cc345366081b"> <img width="2154" alt="Screenshot 2023-12-05 at 11 10 15" src="https://github.com/paritytech/polkadot-sdk/assets/49718502/b8874d8d-632e-443a-9840-14ad8e90c54f"> <img width="2535" alt="Screenshot 2023-12-05 at 11 10 52" src="https://github.com/paritytech/polkadot-sdk/assets/49718502/779a439f-fd18-4985-bb80-85d5afad78e2"> ### Profile with pyroscope 1. Setup pyroscope following the steps in https://github.com/paritytech/polkadot-sdk/tree/master/polkadot/node/subsystem-bench#install-pyroscope, then run any of the benchmark scenario with `--profile` as the arguments. 2. Open the pyroscope dashboard in grafana, e.g: <img width="2544" alt="Screenshot 2024-01-09 at 17 09 58" src="https://github.com/paritytech/polkadot-sdk/assets/49718502/58f50c99-a910-4d20-951a-8b16639303d9"> ### Useful logs 1. Network bandwidth requirements: ``` Payload bytes received from peers: 503993 KiB total, 50399 KiB/block Payload bytes sent to peers: 629971 KiB total, 62997 KiB/block ``` 2. Cpu usage by the approval-distribution/approval-voting subsystems. ``` approval-distribution CPU usage 84.061s approval-distribution CPU usage per block 8.406s approval-voting CPU usage 96.532s approval-voting CPU usage per block 9.653s ``` 3. Time passed until a given block is approved ``` Chain selection approved after 3500 ms hash=0x0101010101010101010101010101010101010101010101010101010101010101 Chain selection approved after 4500 ms hash=0x0202020202020202020202020202020202020202020202020202020202020202 ``` ### Using benchmark to quantify improvements from #1178 + #1191 Using a versi-node we compare the scenarios where all new optimisations are disabled with a scenarios where tranche0 assignments are sent in a single message and a conservative simulation where the coalescing of approvals gives us just 50% reduction in the number of messages we send. Overall, what we see is a speedup of around 30-40% in the time it takes to process the necessary messages and a 30-40% reduction in the necessary bandwidth. #### Best case scenario comparison(minimum required tranches sent). Unoptimised ``` Number of blocks: 10 Payload bytes received from peers: 53289 KiB total, 5328 KiB/block Payload bytes sent to peers: 52489 KiB total, 5248 KiB/block approval-distribution CPU usage 6.732s approval-distribution CPU usage per block 0.673s approval-voting CPU usage 9.523s approval-voting CPU usage per block 0.952s ``` vs Optimisation enabled ``` Number of blocks: 10 Payload bytes received from peers: 32141 KiB total, 3214 KiB/block Payload bytes sent to peers: 37314 KiB total, 3731 KiB/block approval-distribution CPU usage 4.658s approval-distribution CPU usage per block 0.466s approval-voting CPU usage 6.236s approval-voting CPU usage per block 0.624s ``` #### Worst case all tranches sent, very unlikely happens when sharding breaks. Unoptimised ``` Number of blocks: 10 Payload bytes received from peers: 746393 KiB total, 74639 KiB/block Payload bytes sent to peers: 729151 KiB total, 72915 KiB/block approval-distribution CPU usage 118.681s approval-distribution CPU usage per block 11.868s approval-voting CPU usage 124.118s approval-voting CPU usage per block 12.412s ``` vs optimised ``` Number of blocks: 10 Payload bytes received from peers: 503993 KiB total, 50399 KiB/block Payload bytes sent to peers: 629971 KiB total, 62997 KiB/block approval-distribution CPU usage 84.061s approval-distribution CPU usage per block 8.406s approval-voting CPU usage 96.532s approval-voting CPU usage per block 9.653s ``` ## TODOs [x] Polish implementation. [x] Use what we have so far to evaluate #1191 before merging. [x] List of features and additional dimensions we want to use for benchmarking. [x] Run benchmark on hardware similar with versi and kusama nodes. [ ] Add benchmark to be run in CI for catching regression in performance. [ ] Rebase on latest changes for network emulation. --------- Signed-off-by: Andrei Sandu <andrei-mihail@parity.io> Signed-off-by: Alexandru Gheorghe <alexandru.gheorghe@parity.io> Co-authored-by: Andrei Sandu <andrei-mihail@parity.io> Co-authored-by: Andrei Sandu <54316454+sandreim@users.noreply.github.com>
paritytech · Feb 5, 2024 · f9f8868 · f9f8868
1 parent 90849b6
commit f9f8868
Show file tree

Hide file tree

Showing 29 changed files with 2,857 additions and 127 deletions.
diff --git a/Cargo.lock b/Cargo.lock
diff --git a/polkadot/node/core/approval-voting/src/criteria.rs b/polkadot/node/core/approval-voting/src/criteria.rs
@@ -55,11 +55,11 @@ pub struct OurAssignment {
 }
 
 impl OurAssignment {
-	pub(crate) fn cert(&self) -> &AssignmentCertV2 {
+	pub fn cert(&self) -> &AssignmentCertV2 {
 		&self.cert
 	}
 
-	pub(crate) fn tranche(&self) -> DelayTranche {
+	pub fn tranche(&self) -> DelayTranche {
 		self.tranche
 	}
 
@@ -225,7 +225,7 @@ fn assigned_core_transcript(core_index: CoreIndex) -> Transcript {
 
 /// Information about the world assignments are being produced in.
 #[derive(Clone, Debug)]
-pub(crate) struct Config {
+pub struct Config {
 	/// The assignment public keys for validators.
 	assignment_keys: Vec<AssignmentId>,
 	/// The groups of validators assigned to each core.
@@ -321,7 +321,7 @@ impl AssignmentCriteria for RealAssignmentCriteria {
 /// different times. The idea is that most assignments are never triggered and fall by the wayside.
 ///
 /// This will not assign to anything the local validator was part of the backing group for.
-pub(crate) fn compute_assignments(
+pub fn compute_assignments(
 	keystore: &LocalKeystore,
 	relay_vrf_story: RelayVRFStory,
 	config: &Config,

diff --git a/polkadot/node/core/approval-voting/src/lib.rs b/polkadot/node/core/approval-voting/src/lib.rs
@@ -92,11 +92,11 @@ use time::{slot_number_to_tick, Clock, ClockExt, DelayedApprovalTimer, SystemClo
 mod approval_checking;
 pub mod approval_db;
 mod backend;
-mod criteria;
+pub mod criteria;
 mod import;
 mod ops;
 mod persisted_entries;
-mod time;
+pub mod time;
 
 use crate::{
 	approval_checking::{Check, TranchesToApproveResult},
@@ -159,6 +159,7 @@ pub struct ApprovalVotingSubsystem {
 	db: Arc<dyn Database>,
 	mode: Mode,
 	metrics: Metrics,
+	clock: Box<dyn Clock + Send + Sync>,
 }
 
 #[derive(Clone)]
@@ -444,6 +445,25 @@ impl ApprovalVotingSubsystem {
 		keystore: Arc<LocalKeystore>,
 		sync_oracle: Box<dyn SyncOracle + Send>,
 		metrics: Metrics,
+	) -> Self {
+		ApprovalVotingSubsystem::with_config_and_clock(
+			config,
+			db,
+			keystore,
+			sync_oracle,
+			metrics,
+			Box::new(SystemClock {}),
+		)
+	}
+
+	/// Create a new approval voting subsystem with the given keystore, config, and database.
+	pub fn with_config_and_clock(
+		config: Config,
+		db: Arc<dyn Database>,
+		keystore: Arc<LocalKeystore>,
+		sync_oracle: Box<dyn SyncOracle + Send>,
+		metrics: Metrics,
+		clock: Box<dyn Clock + Send + Sync>,
 	) -> Self {
 		ApprovalVotingSubsystem {
 			keystore,
@@ -452,6 +472,7 @@ impl ApprovalVotingSubsystem {
 			db_config: DatabaseConfig { col_approval_data: config.col_approval_data },
 			mode: Mode::Syncing(sync_oracle),
 			metrics,
+			clock,
 		}
 	}
 
@@ -493,15 +514,10 @@ fn db_sanity_check(db: Arc<dyn Database>, config: DatabaseConfig) -> SubsystemRe
 impl<Context: Send> ApprovalVotingSubsystem {
 	fn start(self, ctx: Context) -> SpawnedSubsystem {
 		let backend = DbBackend::new(self.db.clone(), self.db_config);
-		let future = run::<DbBackend, Context>(
-			ctx,
-			self,
-			Box::new(SystemClock),
-			Box::new(RealAssignmentCriteria),
-			backend,
-		)
-		.map_err(|e| SubsystemError::with_origin("approval-voting", e))
-		.boxed();
+		let future =
+			run::<DbBackend, Context>(ctx, self, Box::new(RealAssignmentCriteria), backend)
+				.map_err(|e| SubsystemError::with_origin("approval-voting", e))
+				.boxed();
 
 		SpawnedSubsystem { name: "approval-voting-subsystem", future }
 	}
@@ -909,7 +925,6 @@ enum Action {
 async fn run<B, Context>(
 	mut ctx: Context,
 	mut subsystem: ApprovalVotingSubsystem,
-	clock: Box<dyn Clock + Send + Sync>,
 	assignment_criteria: Box<dyn AssignmentCriteria + Send + Sync>,
 	mut backend: B,
 ) -> SubsystemResult<()>
@@ -923,7 +938,7 @@ where
 	let mut state = State {
 		keystore: subsystem.keystore,
 		slot_duration_millis: subsystem.slot_duration_millis,
-		clock,
+		clock: subsystem.clock,
 		assignment_criteria,
 		spans: HashMap::new(),
 	};

diff --git a/polkadot/node/core/approval-voting/src/tests.rs b/polkadot/node/core/approval-voting/src/tests.rs
@@ -549,7 +549,7 @@ fn test_harness<T: Future<Output = VirtualOverseer>>(
 
 	let subsystem = run(
 		context,
-		ApprovalVotingSubsystem::with_config(
+		ApprovalVotingSubsystem::with_config_and_clock(
 			Config {
 				col_approval_data: test_constants::TEST_CONFIG.col_approval_data,
 				slot_duration_millis: SLOT_DURATION_MILLIS,
@@ -558,8 +558,8 @@ fn test_harness<T: Future<Output = VirtualOverseer>>(
 			Arc::new(keystore),
 			sync_oracle,
 			Metrics::default(),
+			clock.clone(),
 		),
-		clock.clone(),
 		assignment_criteria,
 		backend,
 	);

diff --git a/polkadot/node/core/approval-voting/src/time.rs b/polkadot/node/core/approval-voting/src/time.rs
@@ -33,14 +33,14 @@ use std::{
 };
 
 use polkadot_primitives::{Hash, ValidatorIndex};
-const TICK_DURATION_MILLIS: u64 = 500;
+pub const TICK_DURATION_MILLIS: u64 = 500;
 
 /// A base unit of time, starting from the Unix epoch, split into half-second intervals.
-pub(crate) type Tick = u64;
+pub type Tick = u64;
 
 /// A clock which allows querying of the current tick as well as
 /// waiting for a tick to be reached.
-pub(crate) trait Clock {
+pub trait Clock {
 	/// Yields the current tick.
 	fn tick_now(&self) -> Tick;
 
@@ -49,7 +49,7 @@ pub(crate) trait Clock {
 }
 
 /// Extension methods for clocks.
-pub(crate) trait ClockExt {
+pub trait ClockExt {
 	fn tranche_now(&self, slot_duration_millis: u64, base_slot: Slot) -> DelayTranche;
 }
 
@@ -61,7 +61,8 @@ impl<C: Clock + ?Sized> ClockExt for C {
 }
 
 /// A clock which uses the actual underlying system clock.
-pub(crate) struct SystemClock;
+#[derive(Clone)]
+pub struct SystemClock;
 
 impl Clock for SystemClock {
 	/// Yields the current tick.
@@ -93,11 +94,22 @@ fn tick_to_time(tick: Tick) -> SystemTime {
 }
 
 /// assumes `slot_duration_millis` evenly divided by tick duration.
-pub(crate) fn slot_number_to_tick(slot_duration_millis: u64, slot: Slot) -> Tick {
+pub fn slot_number_to_tick(slot_duration_millis: u64, slot: Slot) -> Tick {
 	let ticks_per_slot = slot_duration_millis / TICK_DURATION_MILLIS;
 	u64::from(slot) * ticks_per_slot
 }
 
+/// Converts a tick to the slot number.
+pub fn tick_to_slot_number(slot_duration_millis: u64, tick: Tick) -> Slot {
+	let ticks_per_slot = slot_duration_millis / TICK_DURATION_MILLIS;
+	(tick / ticks_per_slot).into()
+}
+
+/// Converts a tranche from a slot to the tick number.
+pub fn tranche_to_tick(slot_duration_millis: u64, slot: Slot, tranche: u32) -> Tick {
+	slot_number_to_tick(slot_duration_millis, slot) + tranche as u64
+}
+
 /// A list of delayed futures that gets triggered when the waiting time has expired and it is
 /// time to sign the candidate.
 /// We have a timer per relay-chain block.

diff --git a/polkadot/node/subsystem-bench/Cargo.toml b/polkadot/node/subsystem-bench/Cargo.toml
@@ -38,6 +38,9 @@ sp-core = { path = "../../../substrate/primitives/core" }
 clap = { version = "4.4.18", features = ["derive"] }
 futures = "0.3.21"
 futures-timer = "3.0.2"
+bincode = "1.3.3"
+sha1 = "0.10.6"
+hex = "0.4.3"
 gum = { package = "tracing-gum", path = "../gum" }
 polkadot-erasure-coding = { package = "polkadot-erasure-coding", path = "../../erasure-coding" }
 log = "0.4.17"
@@ -64,6 +67,16 @@ prometheus_endpoint = { package = "substrate-prometheus-endpoint", path = "../..
 prometheus = { version = "0.13.0", default-features = false }
 serde = "1.0.195"
 serde_yaml = "0.9"
+
+polkadot-node-core-approval-voting = { path = "../core/approval-voting" }
+polkadot-approval-distribution = { path = "../network/approval-distribution" }
+sp-consensus-babe = { path = "../../../substrate/primitives/consensus/babe" }
+sp-runtime = { path = "../../../substrate/primitives/runtime", default-features = false }
+sp-timestamp = { path = "../../../substrate/primitives/timestamp" }
+
+schnorrkel = { version = "0.9.1", default-features = false }
+rand_core = "0.6.2"                                                                         # should match schnorrkel
+rand_chacha = { version = "0.3.1" }
 paste = "1.0.14"
 orchestra = { version = "0.3.5", default-features = false, features = ["futures_channel"] }
 pyroscope = "0.5.7"

diff --git a/polkadot/node/subsystem-bench/examples/approvals_no_shows.yaml b/polkadot/node/subsystem-bench/examples/approvals_no_shows.yaml
@@ -0,0 +1,18 @@
+TestConfiguration:
+# Test 1
+- objective: !ApprovalVoting
+    last_considered_tranche: 89
+    coalesce_mean: 3.0
+    coalesce_std_dev: 1.0
+    stop_when_approved: true
+    coalesce_tranche_diff: 12
+    workdir_prefix: "/tmp/"
+    enable_assignments_v2: true
+    num_no_shows_per_candidate: 10
+  n_validators: 500
+  n_cores: 100
+  min_pov_size: 1120
+  max_pov_size: 5120
+  peer_bandwidth: 524288000000
+  bandwidth: 524288000000
+  num_blocks: 10
diff --git a/polkadot/node/subsystem-bench/examples/approvals_throughput.yaml b/polkadot/node/subsystem-bench/examples/approvals_throughput.yaml
@@ -0,0 +1,19 @@
+TestConfiguration:
+# Test 1
+- objective: !ApprovalVoting
+    coalesce_mean: 3.0
+    coalesce_std_dev: 1.0
+    enable_assignments_v2: true
+    last_considered_tranche: 89
+    stop_when_approved: false
+    coalesce_tranche_diff: 12
+    workdir_prefix: "/tmp"
+    num_no_shows_per_candidate: 0
+  n_validators: 500
+  n_cores: 100
+  n_included_candidates: 100
+  min_pov_size: 1120
+  max_pov_size: 5120
+  peer_bandwidth: 524288000000
+  bandwidth: 524288000000
+  num_blocks: 10
diff --git a/polkadot/node/subsystem-bench/examples/approvals_throughput_best_case.yaml b/polkadot/node/subsystem-bench/examples/approvals_throughput_best_case.yaml
@@ -0,0 +1,18 @@
+TestConfiguration:
+# Test 1
+- objective: !ApprovalVoting
+    coalesce_mean: 3.0
+    coalesce_std_dev: 1.0
+    enable_assignments_v2: true
+    last_considered_tranche: 89
+    stop_when_approved: true
+    coalesce_tranche_diff: 12
+    workdir_prefix: "/tmp/"
+    num_no_shows_per_candidate: 0
+  n_validators: 500
+  n_cores: 100
+  min_pov_size: 1120
+  max_pov_size: 5120
+  peer_bandwidth: 524288000000
+  bandwidth: 524288000000
+  num_blocks: 10
diff --git a/polkadot/node/subsystem-bench/examples/approvals_throughput_no_optimisations_enabled.yaml b/polkadot/node/subsystem-bench/examples/approvals_throughput_no_optimisations_enabled.yaml
@@ -0,0 +1,18 @@
+TestConfiguration:
+# Test 1
+- objective: !ApprovalVoting
+    coalesce_mean: 1.0
+    coalesce_std_dev: 0.0
+    enable_assignments_v2: false
+    last_considered_tranche: 89
+    stop_when_approved: false
+    coalesce_tranche_diff: 12
+    workdir_prefix: "/tmp/"
+    num_no_shows_per_candidate: 0
+  n_validators: 500
+  n_cores: 100
+  min_pov_size: 1120
+  max_pov_size: 5120
+  peer_bandwidth: 524288000000
+  bandwidth: 524288000000
+  num_blocks: 10