Skip to content

[ElasticScaling] Enable ElasticScaling on AHKusama and AHPolkadot #10425

@lexnv

Description

@lexnv

Goal

As part of our low-latency roadmap, we are rolling out elastic scaling on Asset Hub Kusama (AHK) and Asset Hub Polkadot (AHP).
The effect of this change would bring ~2s blocks using 3 cores.

The rollout follows a gradual strategy: test-nets (Versi, Westend, Paseo), then Kusama, then Polkadot.

This is part of:

Test Nets Triaging

Asset Hub Westend

Elastic Scaling is enabled on AssetHubWestend, using the following PR:

We have discovered that the asset hub westend is not in an ideal shape:

Multiple collations were not advertised to validators, some validators were offline and some validator records could not be found in the DHT. We believe the issues are not related to the elastic scaling feature and work is in progress together with the Devops team to bring the chain to a stable state.

The chain is running stable2509-2, a release without our latest optimizations aiming to improve collator to validator stability:

Versi

We have enabled elastic scaling on a YAP parachain in versi (our dedicated test net).
For the YAP parachian, we have been running stress testing by sending 2k transactions periodically.
In Versi, we have observed stable 2s block times despite the transaction spamming.

However, we have observed the following behavior which was not expected in test-nets:

  • Collator to validator connectivity shows periodic instability
  • Collation fetch latency spikes from sub 10ms to 2000/4000+ms in various cases

Despite the connection stability, the elastic scaling feature can produce:

For more details check:

Identified the following optimization while debugging the chain:

Paseo

While asset hub westend issues are being resolved, we have decided to deploy a new chain in Paseo:

The chain is running a patched version of origin/master, including the following:

The chain is deployed using 3 collators manually in our cloud instances.

The chain is able to sustain ~2s blocks on average (mainly between 2s-3s), with occasional spikes of 18 blocks.

Image

Per 24h, we see around 571 warnings per collator:

WARN parachain::collator-protocol: [Parachain] Collation wasn't advertised to any validator.

The authoring adjustment is working as expected:

17:24:01.474 DEBUG tokio-runtime-worker aura::cumulus: [Parachain] Adjusted proposal duration. duration=Some(1.526s)
17:24:03.405 DEBUG tokio-runtime-worker aura::cumulus: [Parachain] Adjusted proposal duration. duration=Some(1.595s)
17:24:05.407 DEBUG tokio-runtime-worker aura::cumulus: [Parachain] Adjusted proposal duration. duration=Some(593.137639ms)

The connection between collators and validators is still not ideal, accompanied by occasional collation fetch latency spikes:

Image

Conclusion from Test Nets

After reviewing the above, we have decided to move forward with enabling the elastic-scaling feature on Kusama.

We believe the recent connectivity issues are unrelated to the elastic scaling implementation. They appear to be a combination of networking conditions, validator setups, and possible race cases that delay node connectivity.

Kusama

The following PR prepares elastic scaling with 3 cores on AssetHubKusama:

Polkadot

Asset Hub Polkadot will follow shortly after Kusama has been triaged.

Key Improvements and Other Findings

Sub-issues

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions