diff --git a/state-network.md b/state-network.md index 8bd7eb6..47af12d 100644 --- a/state-network.md +++ b/state-network.md @@ -9,7 +9,7 @@ This document is the specification for the sub-protocol that supports on-demand The execution state network is a [Kademlia](https://pdos.csail.mit.edu/~petar/papers/maymounkov-kademlia-lncs.pdf) DHT that uses the [Portal Wire Protocol](./portal-wire-protocol.md) to establish an overlay network on top of the [Discovery v5](https://github.com/ethereum/devp2p/blob/master/discv5/discv5-wire.md) protocol. -State data from the execution chain consists of all account data from the main storage trie, all contract storage data from all of the individual contract storage tries, and the individul bytecodes for all contracts. +State data from the execution chain consists of all account data from the main storage trie, all contract storage data from all of the individual contract storage tries, and the individul bytecodes for all contracts across all historical state roots. This is traditionally referred to as an "archive node". ### Data @@ -40,33 +40,7 @@ The state network uses the stock XOR distance metric defined in the portal wire ### Content ID Derivation Function -The derivation function for Content Id values are defined separately for each data type. - -#### Path Aware Content ID Derivation - -The state network is primarily concerned with storing state data which is -cryptographically anchored into trie structure. A naive approach to -distribution of this data in the network would be to address trie nodes by -their node-hash. This approach is effective, however, it negatively impacts -the efficiency of lookups. The production patricia merkle trie for ethereum -accounts tends to be at least 7 layers deep, which in a production portal -network environment would mean nodes would have to perform seven (7) recursive -lookups in the DHT to find a single piece of state. - -For this reason, the state network implements a custom scheme that uses the -trie path for the high bits in the content-id and fills the remaining low bits -from the node hash. The result of this is as follows. - -- Content that is very *shallow* in the trie such as the state root node is - relatively randomly distributed around the DHT regardless of the path - component. -- Content that is very deep in the trie such as leaf nodes, or deep - intermediate nodes are clustered in the DHT address space with the most bits - in common with the path to those nodes in the trie. - -This leads to a beneficial situation where leaves that are *close* to each -other in the trie will also be likely to be close to each other in the DHT, -which allows for more efficient network retrieval. +The history network uses the SHA256 Content ID derivation function from the portal wire protocol specification. ### Wire Protocol @@ -137,161 +111,50 @@ MPTWitness := List(witness: WitnessNode, max_length=32) #### Paths (Nibbles) -We define nibbles as a sequence of single hex values +A naive approach to storage of trie nodes would be to simply use the `node_hash` value of the trie node for storage. This scheme however results in stored data not being tied in any direct way to it's location in the trie. In a situation where a participant in the DHT wished to re-gossip data that they have stored, they would need to reconstruct a valid trie proof for that data in order to construct the appropriate OFFER/ACCEPT payload. We include the `path` metadata in state network content keys so that it is possible to reconstruct this proof. + +We define path as a sequences of "nibbles" which represent the path through the MPT to reach the trie node. + ``` -nibble := {0,1,2,3,4,5,6,7,8,9,a,b,c,d,e,f} // 16 possible values +nibble := {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f} // 16 possible values NibblePair := Byte // 2 nibbles tightly packed into a single byte -Nibbles := Vector(NibblePair, length=8) // fixed path length of 8 bytes +Nibbles := List(NibblePair, max_length=32) // fixed path length of 8 bytes ``` + #### Helper Functions We define these helper functions. -##### `combine_path_and_node_hash(path: Nibbles, node_hash: Bytes32) -> Bytes32` +##### `encode_path(path: Nibbles) -> ???` -This function is designed to combine the "nibbles" which define the path of a -node in the hexary merkle trie with the `node_hash` of that node in the trie -into a 32 byte content-id value. This function is designed such that the trie -path occupies the *highest* bits of the content-id and the remaining bits are -sourced from the `node_hash`. The result of this is that trie nodes which are -close to eacch other in the trie will be statistically likely to also be close -to each other when compared using the XOR distance metric in the DHT. +> TODO: what is the most efficient way to encode nibbles and place them in an SSZ container... -One thing to notice about this scheme is that for intermediate trie nodes, the -`path` component will often be short, or even empty in the case of the state -root. However, for leaf nodes, the path component passed into this function -MUST be the full 32 byte path in the trie where the account leaf lives. This -ensures that the content key and id for a leaf node of the trie doesn't change -as the trie around it changes. +- for state roots, the path is the empty list. +- for intermediate nodes we expect between 1-9 nibbles values, aka roughly 3-6 bytes when tightly packed and using terminator byte +- for leaf nodes we can use the 20-byte address pre-image to compress the actual 32-byte path in the trie. + +In an SSZ context we end up with 4-byte length elements for anything variable length which suggests we should potentially do something like: ``` -# The maximum number of bytes in the resulting content-id that may be sourced -# from the trie path -MAX_PATH_BYTES = 8 - -# The allowed values for a 'nibble' -NIBBLES_VALUES = { - 0x00, 0x01, 0x02, 0x03, - 0x04, 0x05, 0x06, 0x07, - 0x08, 0x09, 0x0a, 0x0b, - 0x0c, 0x0d, 0x0e, 0x0f, -} - -def tightly_pack_nibbles(path: Nibbles) -> bytes: - """ - Take an even length bytestring of loosely packed nibbles values and return - them tightly packed as a byte string - - >>> pack_nibbles(b'\x0a\x01\x03\x0f') - b'\xa1\x3f' - """ - assert len(path) % 2 == 0 - ipath = iter(path) - packed_values = [] - for _ in range(len(path) // 2): - high, low = next(ipath), next(ipath) - packed = high << 4 | low - packed_values.append(packed) - - return bytes(packed_values) - -def construct_trie_node_content_id(path: Nibbles, node_hash: Bytes32) -> Bytes32: - # `node_hash` must be a 32 byte value - assert len(node_hash) == 32 - # `path` may not exceed 64 nibbles which is maximum possible trie depth - assert len(path) <= 64 - # `path` must be valid `nibbles` values - assert set(path).issubset(NIBBLES_VALUES) - - trimmed_path = path[:2 * MAX_PATH_BYTES] - - if len(trimmed_path) % 2 == 0: # path length is "even" - # - # path = (0xa, 0xb, 0xc, 0x1, 0x2, 0x3) - # node_hash = 0xdeadbeefdeadbeef.... - # content_id = 0xabc123efdeadbeef.... - # - # +============+======+======+======+======+======+=====+ - # | byte # | #0 | #1 | #2 | #3 | #4 | ... | - # +============+======+======+======+======+======+=====+ - # | path | 0xab | 0xc1 | 0x23 | | | | - # +------------+------+------+------+------+------+-----+ - # | content_id | ^^ | | ^^ | | | | - # | content_id | 0xab | 0xc1 | 0x23 | 0xef | 0xde | ... | - # | content_id | | | | vv | vv | | - # +------------+------+------+------+------+------+-----+ - # | node_hash | 0xde | 0xad | 0xbe | 0xef | 0xde | ... | - # +------------+------+------+------+------+------+-----+ - # - packed_path = tightly_pack_nibbles(trimmed_path) - node_hash_low_part = node_hash[len(packed_path):] - return packed_path + node_hash_low_part - else: # path length is "odd" - # - # path = (0xa, 0xb, 0xc, 0x1, 0x2) - # node_hash = 0xdeadbeefdeadbeef.... - # content_id = 0xabc12eefdeadbeef.... - # - # +============+======+======+======+======+======+=====+ - # | byte # | #0 | #1 | #2 | #3 | #4 | ... | - # +============+======+======+======+======+======+=====+ - # | path | 0xab | 0xc1 | 0x2 | | | | - # +------------+------+------+------+------+------+-----+ - # | content_id | ^^ | | ^ | | | | - # | content_id | 0xab | 0xc1 | 0x2e | 0xef | 0xde | ... | - # | content_id | | | v | vv | vv | | - # +------------+------+------+------+------+------+-----+ - # | node_hash | 0xde | 0xad | 0xbe | 0xef | 0xde | ... | - # +------------+------+------+------+------+------+-----+ - # - packed_path_high_part = tightly_pack_nibbles(trimmed_path[:-1]) - - # Since the nibbles value in this case is odd length, then we must - # combine the last nibble with the lower 4 bits of the byte in that - # position from the node_hash value. - middle_high_part = trimmed_path[-1] << 4 - middle_low_part = node_hash[len(packed_path_high_part)] & 0x0f - middle_byte = middle_high_part | middle_low_part - - node_hash_low_part = node_hash[len(packed_path_high_part) + 1:] - - return packed_path_high_part + bytes((middle_byte,)) + node_hash_low_part +AddressPath := Bytes20 +max_possible_nibbles_depth := ??? # What is the maximum depth we should account for.... +IntermediatePath := Vector(Byte, length=max_possible_nibbles_depth) +EncodedPath := Union(AddressPath, IntermediatePath) ``` -#### `rotate(address: Bytes20, location: Bytes32) -> Bytes32` - -This function is used to rotate the content-id values of contract storage trie -nodes in the DHT space, in order to ensure that contract storage tries are -evenly distributed accross the DHT key space. - - -```python -U256_MAX = 2**256 - 1 - -def rotate(address: Bytes20, location: Bytes32) -> Bytes32: - """ - Location is a 32-byte value representation a location in the DHT key space. - """ - rotation_amount = keccak256(address) - rotated_location = rotation_amount + location - - # The modulo here ensures that the resulting location is constrained - # appropriately to the 32-byte DHT key space. - return rotated_location % U256_MAX -``` #### Account Trie Node ``` -account_trie_node_key := Container(path: Nibbles, node_hash: Bytes32) +account_trie_node_key := Container(encoded_path: EncodedPath, node_hash: Bytes32) selector := 0x20 content_for_offer := Container(proof: MPTWitness, block_hash: Bytes32) content_for_retrieval := Container(node: WitnessNode) -content_id := construct_trie_node_content_id(path: Nibbles, node_hash: Bytes32) content_key := selector + SSZ.serialize(account_trie_node_key) +content_id := sha256(content_key) ``` #### Contract Trie Node * @@ -303,8 +166,8 @@ selector := 0x21 content_for_offer := Container(account_proof: MPTWitness, storage_proof: MPTWitness, block_hash: Bytes32) content_for_retrieval := Container(node: WitnessNode) -content_id := rotate(storage_trie_node_key.address: Address, construct_trie_node_content_id(storage_trie_node_key.path, storage_trie_node_key.node_hash)) content_key := selector + SSZ.serialize(storage_trie_node_key) +content_id := sha256(content_key) ``` @@ -319,8 +182,8 @@ selector := 0x22 content_for_offer := Container(code: ByteList, account_proof: MPTWitness, block_hash: Bytes32) content_for_retrieval := Container(code: ByteList) -content_id := sha256(contract_code_key.address + contract_code_key.code_hash) content_key := selector + SSZ.serialize(contract_code_key) +content_id := sha256(content_key) ```