BIP: UtxoCommitBucket Title: Bucketed Utxo Commitment Author: Tomas van der Wansem <tomas@bitcrust.org> Created: 2017-10-02 License: PD
This BIP describes the construction of a 32-byte hash that is uniquely defined for a UTXO set, that is, the set of all unspent transaction outputs that exist at a certain block in the blockchain. This BIP does not describe inclusion of such hash in the coinbase, or changes to block verification.
When added to blocks, a UTXO commitment can be used to:
- Fast initial loading of a full node by downloading and verifying the UTXO set instead of the full blockchain.
- Downloading and verifying a part of the UTXO set that matches certain outputs.
- Proving existence of a UTXO at a certain block.
- Proving non-existence of a UTXO at a certain block.
Either we construct a storage format for the UTXO set which allows extraction of a deterministic, unique hash. The challege is to define the format such, that it does not have significantly worse perfomance charactistics than a generalized key/value store.
Or we can define a UTXO commitment which must be maintained alongside the UTXO database. The challenge is to ensure that maintaining the commitment does not impose to much additional burden.
This proposal takes the latter approach. We use Elliptic Curve Multisets to maintain sets of UTXO's and hash these into a prefix tree, with a fixed branch size of 16.
We specify the construction of a tree that contains a set of elements, each of which is a sequence of at least 32 bytes.
We define:
- A node is either a branch node or a leaf node. Each node has a unique bit sequence called the prefix
- The root node is a branch node with an empty prefix.
- For each branch node, we define 16 child nodes numbered 0 to 15, each of which has the same prefix as the parent, extended by 4 bits encoding its number. A leaf node has no child nodes.
- A node contains an element if the element starts with the prefix.
- The hash of a leaf node is the 32-byte ECMH multiset hash of each of the elements it contains.
- The hash of a branch node is the 32-byte double-SHA256 of the concatenation of the 16 hashes of its child nodes.
- Each branch node is either the root node, or contains at least 2001 elements.
- Each leaf node contains at most 2000 elements.
We define a UTXO element as the byte sequence contructed from an unspent output using the serialization:
Field Size | Description | Data type | Comments |
---|---|---|---|
32 | txid | char[32] | The id of the transaction |
4 | index | uint32_t | The index of the output within the transaction |
4 | height, is_coinbase | uint32_t | The least significant bit is set if this is a coinbase output. The other 31 bits represent the height |
8 | value | int64_t | The amount in satoshi |
? | pk_script | uchar[] | The Bitcoin script that defines the conditions to claim this output. |
Note that the serialization is not compressed, but compression can be used in transferring UTXO sets and proofs.
The UTXO commitment for a certain block is the commitment of all UTXO elements that are unspent in the blockchain after that block.
The tree as specified requires the elements to approximate a random distribution. The UTXO elements are unfortunately not, as a single transaction with lots of outputs all share the same first 32 bytes. There are several solutions from which we must choose:
- We can use the serializion H(txid+index)+value+pk_script. This would also require the use of H(txid+index) for the coins database which could harm performance.
- We can replace the threshold for splitting a branch node to state that a leaf node must contain 2000 elements or its parent must contain at most 16000 elements
- We can add UTXO elements as specified but *count* only distinct transaction ids.
Proposed implementation of the Elliptic Curve Multiset: bitcoin-core/secp256k1#477
Proposed implementation of the UTXO commitment: https://github.com/tomasvdw/bitcoin-abc/tree/utxocommit/src/utxocommit
Implementation results (T400, Core i7 2 core 2.4ghz):
- Initial load 50 million elements: 310 secs.
- Memory 50 million elements: 2.2mb
- Proof size: ~100kb
- Add block (10,000 utxo-ops): ~80ms
- GetHash : ~180ms
This document is placed in the public domain.