Decide if we want to merklize state #647

kayabaNerve · 2025-01-17T04:48:56Z

Merklizing state:

Enshrines a storage layout
Adds notable overhead to IO

In #330, I opened the disscussion on client diversity. An enshrined storage layout requires the storage be matched exactly even if more efficient algorithms exist. I'll give the example of how the protocol expects to calculate a median over a window, and how the runtime uses the database to do so. Instead of reading every value, removing with shift, inserting, finding the median via selecting the middle entry, and saving the new set of values, we just save the storage key for the median. We then use next key/prev key (DB iteration fns) to move the median as necessary. It'd be better to cache that across blocks and simply do the expensive median finding once at boot. We only don't because this solution is optimal for the box Substrate gives us. If enshrined, any and all implementations have to use this methodology.

In #379, I express the intent to only merklize events due to this.

It should be noted however that Substrate offers "warp sync", where the latest finalized state is downloades but the transitions aren't validated. This introduces a supermajority trust assumption but users:

Can check they're on the same chain as everyone else, which presumably was validated by everyone else
Validate all future state transitions

So long as at least one node was honest, and if their node rejects a finalized block, raise the alarm in time for future initial syncers to be aware, this is fine. This requires state be merklized however.

We can define events as having sufficient data to rebuild the state, but this would require careful planning and that we build a custom state sync for warp sync. Each event would need to contain an account's new balance, not the amount transferred, and we'd still need to merklize the last time an account was updated. This also handwaves the entire state as solely balances.

Warp sync does greatly reduce the time to sync, but part of why syncing is so long is because of state merklization and once synced, a node still has to pay the overhead of merklization. By having a larger initial sync than warp sync, but a shorter initial sync than we would with state merklization, we'd achieve greater performance while running the node.

There are 'next-generation' Merkle DBs though. A great blog post on the techniques is https://sovereign.mirror.xyz/jfx_cJ_15saejG9ZuQWjnGnG-NfahbazQH98i1J3NN8. #385 considered using Avalanche's Firewood, resolving as not eligible due to it not being FOSS. We don't have an issue for Nomad, which isn't published yet, nor NOMT, but there's already commentary on it re: Substrate.

One concern with adopting such a DB is that they'll have distinct wire formats. Modifying them to be compatible may be infeasible. Reimplementing their wire format, but without all of their optimizations, would achieve diversity by removing the performance gains.

Please note we have discussed diversity of the current schema in #385 (Firewood was mentioned for diversity, not to solve performance constraints), #403 discussed reth's, not because it's so hyper-optimized, but because it's presumably well reviewed and rock solid, and #386 parity-db. parity-db is notable as Substrate ships with RocksDB and parity-db backends, so it's already diverse without us needing to do more work.

There's also relevance in this discussion to paritytech/polkadot-sdk#278. If we remove merklization, except for events, we can do the merklization for events in memory and not worry about the cleanup cost as they'd never so entangle the DB.

Finally, I'll note merklized state exists for light clients. Warp sync can be viewed as this where you can download every entry without actually incurring the transitions. It's also necessary for Polkadot due to their parachain design. Serai isn't a parachain/L2, and should have minimal light client use-cases? We don't offer a VM to do SPV bridging with? So long as the next set of validators is in an event, following consensus should be possible? Verifying transfers were made, if that's ever desired, is still possible with just an event stream? It's current account balances which would be unaccessible (unless we also include that in the event).

I lean towards only publishing a commitment to the events, making Serai ineligible for warp sync. I don't want to commit to an entire storage, and tree, schema at this time. The loss of warp sync is unfortunate but acceptable. We can also look at restoring it via ad-hoc having validators sign state roots, for as long as a supermajority of validators run a merklized DB even though it isn't required by the protocol, and publishing those signatures out-of-band to consensus itself. This scheme works even without constantly merklized state if regular exports are made and so signed.

kayabaNerve · 2025-01-17T04:49:44Z

Please note a decision not to will only remove it from the protocol, not the implementation, initially. I'm not accepting the scope of the latter at this time.

kayabaNerve added discussion This requires discussion node labels Jan 17, 2025

kayabaNerve mentioned this issue Jan 17, 2025

Replace SCALE with borsh entirely #573

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Decide if we want to merklize state #647

Decide if we want to merklize state #647

kayabaNerve commented Jan 17, 2025

kayabaNerve commented Jan 17, 2025

Decide if we want to merklize state #647

Decide if we want to merklize state #647

Comments

kayabaNerve commented Jan 17, 2025

kayabaNerve commented Jan 17, 2025