Skip to content

Conversation

bhartnett
Copy link
Contributor

@bhartnett bhartnett commented Oct 14, 2025

This PR updates the KVT txFrame API to support multiple column families and also introduces and uses separate column families for contract code and witnesses.

For performance reasons the kvts are stored in an array indexed by the column family enum type. The KvtCfs enum type is renamed to KvtType and moved so that it can be exposed as part of the CoreDb API in the base module. When a KvtType is not specified then the default KvtType.Generic is used.

When using column families the data is partitioned into a separate space so the DBKeyKind key mapping functions in the storage_types module are no longer required for the contract code and witness reads and writes.

This changes the structure of the database on disk and is therefore not backwards compatible meaning that after this change is merged then all nodes will require a full re-sync.

@bhartnett
Copy link
Contributor Author

I ran a block import benchmark on the first 10 million mainnet blocks. Here are the results:

master.csv vs cfs.csv
                        bps_x     bps_y      tps_x      tps_y    time_x    time_y    bpsd    tpsd   timed
block_number                                                                                             
(499713, 1555300]    6,182.15  6,284.72  21,221.80  21,462.51     3m12s     3m10s   1.88%   1.88%  -0.95%
(1555300, 2610888]   3,044.54  3,039.45  22,303.01  22,282.89    21m44s    21m47s   0.27%   0.27%   0.21%
(2610888, 3666475]   2,829.84  2,833.75  25,983.09  25,963.67    13m49s    13m55s   0.13%   0.13%   0.24%
(3666475, 4722063]     423.39    419.01  22,901.83  22,699.26     58m0s    58m25s  -0.85%  -0.85%   0.87%
(4722063, 5777650]     129.25    127.46  17,984.94  17,746.14  2h18m12s   2h20m7s  -1.37%  -1.37%   1.40%
(5777650, 6833238]     125.00    124.14  12,418.48  12,334.55   2h22m5s   2h23m1s  -0.67%  -0.67%   0.67%
(6833238, 7888825]     120.04    118.71  12,230.86  12,093.94   2h27m1s  2h28m40s  -1.11%  -1.11%   1.12%
(7888825, 8944413]     110.33    109.30  12,485.44  12,366.73  2h40m55s  2h42m25s  -0.93%  -0.93%   0.94%
(8944413, 10000001]     92.63     92.08   9,825.64   9,772.01  3h14m47s  3h15m48s  -0.54%  -0.54%   0.58%

blocks: 9492096, baseline: 14h39m48s, contender: 14h47m22s
Time (total): 7m33s, 0.86%

bpsd = blocks per sec diff (+), tpsd = txs per sec diff, timed = time to process diff (-)
+ = more is better, - = less is better

Unfortunately, there doesn't appear to be any performance improvement from this change. Perhaps the overhead of the additional kvts is more significant then any potential speed up from using the additional column families.

@arnetheduck
Copy link
Member

arnetheduck commented Oct 18, 2025

Additional column families are expensive to manage and cause the WAL to expand - in general, "common prefixes" in keys are almost free (the prefix is stored separately) so the prefix strategy is to be preferred unless different column family options are needed and a few other special cases (like txframe lifetime).

@arnetheduck
Copy link
Member

Contract code is interesting in that it has slightly different lifetime properties than other transaction data: it is shared between accounts - this makes its lifetime management slightly different from that of block data for example.

@bhartnett
Copy link
Contributor Author

bhartnett commented Oct 20, 2025

Additional column families are expensive to manage and cause the WAL to expand - in general, "common prefixes" in keys are almost free (the prefix is stored separately) so the prefix strategy is to be preferred unless different column family options are needed and a few other special cases (like txframe lifetime).

Yes true, it wouldn't be a good idea to create a column family for every data type due to the additional cost of maintaining them (WAL and memory usage) but I thought it might make sense for us to look into creating additional cfs for some of the data where the cf options should be configured differently to maximize performance.

The initial driver for this PR/investigation was that I noticed that we use a very small amount of column families, only 1 for all the account, storage and hashes and another 1 for all the kvt data and 1 for the syncing headers data. Hyperledger Besu on the other hand uses quite a large number. These are the relevant cfs from the KeyValueSegmentIdentifier enum see here:

  BLOCKCHAIN(new byte[] {1}, EnumSet.allOf(DataStorageFormat.class), true, true, false),
  ACCOUNT_INFO_STATE(new byte[] {6}, EnumSet.of(BONSAI, X_BONSAI_ARCHIVE), false, true, false),
  CODE_STORAGE(new byte[] {7}, EnumSet.of(BONSAI, X_BONSAI_ARCHIVE)),
  ACCOUNT_STORAGE_STORAGE(new byte[] {8}, EnumSet.of(BONSAI, X_BONSAI_ARCHIVE), false, true, false),
  TRIE_BRANCH_STORAGE(new byte[] {9}, EnumSet.of(BONSAI, X_BONSAI_ARCHIVE), false, true, false),
  TRIE_LOG_STORAGE(new byte[] {10}, EnumSet.of(BONSAI, X_BONSAI_ARCHIVE), true, false, true),
  BACKWARD_SYNC_HEADERS(new byte[] {13}),
  BACKWARD_SYNC_BLOCKS(new byte[] {14}),
  BACKWARD_SYNC_CHAIN(new byte[] {15}),

3 for syncing, 1 for blockchain data (headers, bodies, receipts, etc), 5 for the Bonsai format which contains the account state, storage, code, and state diffs. The boolean parameters here configure the cf settings that may differ between the column families. Note that it is common here (in Besu) for multiple column families to have the same settings so I wonder what the reasoning is for spliting the data here across cfs.

@arnetheduck
Copy link
Member

arnetheduck commented Oct 20, 2025

In our case, the main reason to use a separate CF for code would actually be different: to differentiate lifetime management and get rid of the hierarchy / dag of txframes on the block side.

Since ForkedChain keeps track of all head blocks, the block contents could be written to the block database outside of the aristo TxFrame and instead only maintain a single block snapshot - basically, whenever a block is validated, write it to disk and if the branch gets pruned, prune it from disk. This would save a lot of memory, specially when combined with a proper version of #3628 .

The thing that makes this difficult right now is the contract code - code is shared between accounts but in case of reorgs, we should not write code from orphaned branches - for this to be "simple", code could still follow the dag model while blocks could follow a simpler, non-layered approach.

@bhartnett
Copy link
Contributor Author

The thing that makes this difficult right now is the contract code - code is shared between accounts but in case of reorgs, we should not write code from orphaned branches - for this to be "simple", code could still follow the dag model while blocks could follow a simpler, non-layered approach.

Yes contract code is shared and so you wouldn't be able to clean up the code written by an orphaned branch when pruning because it might also be used/needed by another branch or account in the state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants