-
Notifications
You must be signed in to change notification settings - Fork 690
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Subsystem benchmarks: determine node CPU usage for 1000 validators and 200 full occupied cores #5035
Comments
Benchmark specification at #5043 and run at https://gitlab.parity.io/parity/mirrors/polkadot-sdk/-/jobs/6716316. The specification is with 1000 validators with 200 cores fully occupied, we assumed that the network is in a state where there are 3 no-shows per candidate, the per subsystem usage in that situation is:
Overall, the CPU usage per block for these subsystems is around 7s, we need take into account other unaccounted subsystems, we know networking thread is close 100% usage, so that adds another 6s, and I would add another 5s for all the unaccounted subsystems, that adds up to 18s of CPU usage per block just for the consensus related work. Assuming after https://forum.polkadot.network/t/rfc-increasing-recommended-minimum-core-count-for-reference-hardware/8156, every validator already migrate to HW with 8 cpu cores the total available execution time is 8 * 6 = 48. That leaves us with 30s(48 - 18) available for PVF execution, assuming each validator won't have to validate more than 10 candidates per block, that gives us space for the average PVF execution to be around 3s. In conclusion, with the data we have I think the increase of the validators HW spec from 4 cores to 8 cores, should allow us to support 1kv and 200 cores. |
Thanks @alexggh! More than 60% of a nodes CPU used for PVF execution sounds amazing. You assumed 1 full CPU being used by networking, however with |
Is this just because it's lots of messages?
Is this verifying assignment VRFs, verifying approval votes, and managing the approvals database?
Is this just sending out the chunks?
What is here?
All erasure code work lies here, no? We've already using some flavor of systemic chunks, or at least direct fetch here, so this mostly counts only recomputing the availability merkle root. If many nodes go offline then we'd be doing reconstructions here, which doubles or tripples this, right?
This is all in backing, right? |
Yes, bookkeeping and the logic to gossip scales up with the number of messages and the number of validators in the network.
Yes.
Yes.
Storing the chunks
Yes
I have to double check, but I don't think the system-chunks are enabled in this benchmarks.
Yes. |
In test used |
Related to this I've been running a small versi experiment. Running polkadot in a cpu restrictive environmentIn order to determine the stability, readiness and requirements of polkadot for 1k validators and 200 cores, I ran a few experiments where the node hardware requirements where scaled down by a factor of 16 and 8 and pushed the network to the limit. Translating dimensions by the scale factor
The goals here was that by scaling every dimension we would gather useful data about the behaviour of the real network, because validators would be CPU starved/throttled, so everything would take longer to execute be delayed. !!Limitations!! This would probably not reflect very well the load created by things that are scaling with the number of validators and parachains, like assignments and approvals or bitfield processing. ResultsI ended up running three experiments:
Findings / Confirming what we already assumed or knew
|
Nice work Alex! Based on these numbers we should be fine with 1kV and 200 cores, but that also depends on how we deal with the high amount of gossip messages and chunk requests. Subsystem benchmarking should provide a good estimate for this. What are you planning to do next ?
Maybe libp2p might be using some unbounded or higher limit channels, litep2p has them bounded at 4096 CC @paritytech/networking |
That's already estimated here: #5035 (comment) and we concluded with 8 cores we should have enough CPUs for it. So once all the optimisations that we have in progress land on kusama, we should be able to gradually ramp up the number of validators doing parachains consensus work there and do it gradually so we can catch any deterioration and reverse course if our estimations prove wrong. |
An order of magnitude? wow. I suppose erasure-node acts like a fixed cost here, due to how backers work. And libp2p-node might've many fixed costs too. That'd be roughly 16x or 8x. In producton, what hogs the resources? Someone claimed signature verification recently. |
These are the usages of a real kusama validators. And these are the usages that I got in restrictive environment: The PoVs on Kusama are really small when compared with my tests hence why I think erasure-task does not show up as using a lot of cpu on kusama.
You mean assignments and approval?, yeah that was a problem as well especially if you have a lot of no-shows, but we did a lot of optimisations on that path as part of this thread: #4849, so it shouldn't be a problem anymore. |
This issue has been mentioned on Polkadot Forum. There might be relevant details there: https://forum.polkadot.network/t/litep2p-network-backend-updates/9973/1 |
We should repeat the testing done in #4126 (comment) to get some numbers with the 1kv and 200 cores. This woulld help to better inform the decision and outlook of raising HW specs: https://forum.polkadot.network/t/rfc-increasing-recommended-minimum-core-count-for-reference-hardware/8156
The text was updated successfully, but these errors were encountered: