Support column families in kvt txFrame API and use for contract code #3765

bhartnett · 2025-10-14T12:34:40Z

This PR updates the KVT txFrame API to support multiple column families and also introduces and uses separate column families for contract code and witnesses.

For performance reasons the kvts are stored in an array indexed by the column family enum type. The KvtCfs enum type is renamed to KvtType and moved so that it can be exposed as part of the CoreDb API in the base module. When a KvtType is not specified then the default KvtType.Generic is used.

When using column families the data is partitioned into a separate space so the DBKeyKind key mapping functions in the storage_types module are no longer required for the contract code and witness reads and writes.

This changes the structure of the database on disk and is therefore not backwards compatible meaning that after this change is merged then all nodes will require a full re-sync.

…n family for reading and storing code.

… coredb layer.

bhartnett · 2025-10-17T15:19:03Z

I ran a block import benchmark on the first 10 million mainnet blocks. Here are the results:

master.csv vs cfs.csv
                        bps_x     bps_y      tps_x      tps_y    time_x    time_y    bpsd    tpsd   timed
block_number                                                                                             
(499713, 1555300]    6,182.15  6,284.72  21,221.80  21,462.51     3m12s     3m10s   1.88%   1.88%  -0.95%
(1555300, 2610888]   3,044.54  3,039.45  22,303.01  22,282.89    21m44s    21m47s   0.27%   0.27%   0.21%
(2610888, 3666475]   2,829.84  2,833.75  25,983.09  25,963.67    13m49s    13m55s   0.13%   0.13%   0.24%
(3666475, 4722063]     423.39    419.01  22,901.83  22,699.26     58m0s    58m25s  -0.85%  -0.85%   0.87%
(4722063, 5777650]     129.25    127.46  17,984.94  17,746.14  2h18m12s   2h20m7s  -1.37%  -1.37%   1.40%
(5777650, 6833238]     125.00    124.14  12,418.48  12,334.55   2h22m5s   2h23m1s  -0.67%  -0.67%   0.67%
(6833238, 7888825]     120.04    118.71  12,230.86  12,093.94   2h27m1s  2h28m40s  -1.11%  -1.11%   1.12%
(7888825, 8944413]     110.33    109.30  12,485.44  12,366.73  2h40m55s  2h42m25s  -0.93%  -0.93%   0.94%
(8944413, 10000001]     92.63     92.08   9,825.64   9,772.01  3h14m47s  3h15m48s  -0.54%  -0.54%   0.58%

blocks: 9492096, baseline: 14h39m48s, contender: 14h47m22s
Time (total): 7m33s, 0.86%

bpsd = blocks per sec diff (+), tpsd = txs per sec diff, timed = time to process diff (-)
+ = more is better, - = less is better

Unfortunately, there doesn't appear to be any performance improvement from this change. Perhaps the overhead of the additional kvts is more significant then any potential speed up from using the additional column families.

arnetheduck · 2025-10-18T06:03:03Z

Additional column families are expensive to manage and cause the WAL to expand - in general, "common prefixes" in keys are almost free (the prefix is stored separately) so the prefix strategy is to be preferred unless different column family options are needed and a few other special cases (like txframe lifetime).

arnetheduck · 2025-10-18T06:07:19Z

Contract code is interesting in that it has slightly different lifetime properties than other transaction data: it is shared between accounts - this makes its lifetime management slightly different from that of block data for example.

bhartnett · 2025-10-20T06:08:12Z

Additional column families are expensive to manage and cause the WAL to expand - in general, "common prefixes" in keys are almost free (the prefix is stored separately) so the prefix strategy is to be preferred unless different column family options are needed and a few other special cases (like txframe lifetime).

Yes true, it wouldn't be a good idea to create a column family for every data type due to the additional cost of maintaining them (WAL and memory usage) but I thought it might make sense for us to look into creating additional cfs for some of the data where the cf options should be configured differently to maximize performance.

The initial driver for this PR/investigation was that I noticed that we use a very small amount of column families, only 1 for all the account, storage and hashes and another 1 for all the kvt data and 1 for the syncing headers data. Hyperledger Besu on the other hand uses quite a large number. These are the relevant cfs from the KeyValueSegmentIdentifier enum see here:

  BLOCKCHAIN(new byte[] {1}, EnumSet.allOf(DataStorageFormat.class), true, true, false),
  ACCOUNT_INFO_STATE(new byte[] {6}, EnumSet.of(BONSAI, X_BONSAI_ARCHIVE), false, true, false),
  CODE_STORAGE(new byte[] {7}, EnumSet.of(BONSAI, X_BONSAI_ARCHIVE)),
  ACCOUNT_STORAGE_STORAGE(new byte[] {8}, EnumSet.of(BONSAI, X_BONSAI_ARCHIVE), false, true, false),
  TRIE_BRANCH_STORAGE(new byte[] {9}, EnumSet.of(BONSAI, X_BONSAI_ARCHIVE), false, true, false),
  TRIE_LOG_STORAGE(new byte[] {10}, EnumSet.of(BONSAI, X_BONSAI_ARCHIVE), true, false, true),
  BACKWARD_SYNC_HEADERS(new byte[] {13}),
  BACKWARD_SYNC_BLOCKS(new byte[] {14}),
  BACKWARD_SYNC_CHAIN(new byte[] {15}),

3 for syncing, 1 for blockchain data (headers, bodies, receipts, etc), 5 for the Bonsai format which contains the account state, storage, code, and state diffs. The boolean parameters here configure the cf settings that may differ between the column families. Note that it is common here (in Besu) for multiple column families to have the same settings so I wonder what the reasoning is for spliting the data here across cfs.

arnetheduck · 2025-10-20T13:56:40Z

In our case, the main reason to use a separate CF for code would actually be different: to differentiate lifetime management and get rid of the hierarchy / dag of txframes on the block side.

Since ForkedChain keeps track of all head blocks, the block contents could be written to the block database outside of the aristo TxFrame and instead only maintain a single block snapshot - basically, whenever a block is validated, write it to disk and if the branch gets pruned, prune it from disk. This would save a lot of memory, specially when combined with a proper version of #3628 .

The thing that makes this difficult right now is the contract code - code is shared between accounts but in case of reorgs, we should not write code from orphaned branches - for this to be "simple", code could still follow the dag model while blocks could follow a simpler, non-layered approach.

bhartnett · 2025-10-20T14:09:34Z

The thing that makes this difficult right now is the contract code - code is shared between accounts but in case of reorgs, we should not write code from orphaned branches - for this to be "simple", code could still follow the dag model while blocks could follow a simpler, non-layered approach.

Yes contract code is shared and so you wouldn't be able to clean up the code written by an orphaned branch when pruning because it might also be used/needed by another branch or account in the state.

bhartnett added 11 commits October 14, 2025 19:48

Add kvts for column families to txFrame and use KvtContractCode colum…

d3d6d1c

…n family for reading and storing code.

Update test.

d54d9e8

Use pairsNotNil iterator.

5bd3af1

Merge branch 'master' into column-families

d03601c

Make KvtCFs enum pure and map to CoreDbKvtType enum type when used in…

03d5cda

… coredb layer.

Update test.

f53e6b0

Try fixing verified proxy CI build.

a222d26

Fix copyright and revert CI change.

cb8ca0f

Rename KvtCfs to KvtType, move and fix module imports.

052123d

Fix copyright.

da93e38

Use separate column family for witness storage.

5a7eb0a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support column families in kvt txFrame API and use for contract code #3765

Support column families in kvt txFrame API and use for contract code #3765

Uh oh!

bhartnett commented Oct 14, 2025 •

edited

Loading

Uh oh!

bhartnett commented Oct 17, 2025

Uh oh!

arnetheduck commented Oct 18, 2025 •

edited

Loading

Uh oh!

arnetheduck commented Oct 18, 2025

Uh oh!

bhartnett commented Oct 20, 2025 •

edited

Loading

Uh oh!

arnetheduck commented Oct 20, 2025 •

edited

Loading

Uh oh!

bhartnett commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Support column families in kvt txFrame API and use for contract code #3765

Are you sure you want to change the base?

Support column families in kvt txFrame API and use for contract code #3765

Uh oh!

Conversation

bhartnett commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bhartnett commented Oct 17, 2025

Uh oh!

arnetheduck commented Oct 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arnetheduck commented Oct 18, 2025

Uh oh!

bhartnett commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arnetheduck commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bhartnett commented Oct 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bhartnett commented Oct 14, 2025 •

edited

Loading

arnetheduck commented Oct 18, 2025 •

edited

Loading

bhartnett commented Oct 20, 2025 •

edited

Loading

arnetheduck commented Oct 20, 2025 •

edited

Loading