-
Notifications
You must be signed in to change notification settings - Fork 146
Support column families in kvt txFrame API and use for contract code #3765
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…n family for reading and storing code.
I ran a block import benchmark on the first 10 million mainnet blocks. Here are the results:
Unfortunately, there doesn't appear to be any performance improvement from this change. Perhaps the overhead of the additional kvts is more significant then any potential speed up from using the additional column families. |
Additional column families are expensive to manage and cause the WAL to expand - in general, "common prefixes" in keys are almost free (the prefix is stored separately) so the prefix strategy is to be preferred unless different column family options are needed and a few other special cases (like txframe lifetime). |
Contract code is interesting in that it has slightly different lifetime properties than other transaction data: it is shared between accounts - this makes its lifetime management slightly different from that of block data for example. |
Yes true, it wouldn't be a good idea to create a column family for every data type due to the additional cost of maintaining them (WAL and memory usage) but I thought it might make sense for us to look into creating additional cfs for some of the data where the cf options should be configured differently to maximize performance. The initial driver for this PR/investigation was that I noticed that we use a very small amount of column families, only 1 for all the account, storage and hashes and another 1 for all the kvt data and 1 for the syncing headers data. Hyperledger Besu on the other hand uses quite a large number. These are the relevant cfs from the KeyValueSegmentIdentifier enum see here:
3 for syncing, 1 for blockchain data (headers, bodies, receipts, etc), 5 for the Bonsai format which contains the account state, storage, code, and state diffs. The boolean parameters here configure the cf settings that may differ between the column families. Note that it is common here (in Besu) for multiple column families to have the same settings so I wonder what the reasoning is for spliting the data here across cfs. |
In our case, the main reason to use a separate CF for code would actually be different: to differentiate lifetime management and get rid of the hierarchy / dag of txframes on the block side. Since The thing that makes this difficult right now is the contract code - code is shared between accounts but in case of reorgs, we should not write code from orphaned branches - for this to be "simple", code could still follow the dag model while blocks could follow a simpler, non-layered approach. |
Yes contract code is shared and so you wouldn't be able to clean up the code written by an orphaned branch when pruning because it might also be used/needed by another branch or account in the state. |
This PR updates the KVT txFrame API to support multiple column families and also introduces and uses separate column families for contract code and witnesses.
For performance reasons the kvts are stored in an array indexed by the column family enum type. The
KvtCfs
enum type is renamed toKvtType
and moved so that it can be exposed as part of the CoreDb API in the base module. When aKvtType
is not specified then the defaultKvtType.Generic
is used.When using column families the data is partitioned into a separate space so the
DBKeyKind
key mapping functions in thestorage_types
module are no longer required for the contract code and witness reads and writes.This changes the structure of the database on disk and is therefore not backwards compatible meaning that after this change is merged then all nodes will require a full re-sync.