Skip to content

Conversation

@ernestognw
Copy link
Member

@ernestognw ernestognw commented Jul 31, 2025

Follow up from #5680

PR Checklist

  • Tests
  • Documentation
  • Changeset entry (run npx changeset add)

Co-authored-by: Ernesto García <ernestognw@gmail.com>
@ernestognw
Copy link
Member Author

I went ahead with this comment and removed the getHash function in favor of a Memory.equal for slices.

Copy link
Member Author

@ernestognw ernestognw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, but I think we need to discuss one last thing.

There's this PoC that shows it's possible to construct 2 RLP items such that keccak256(rlp.encode(r1)) == bytes32(rlp.encode(r2)) being r1 a long item (length >= 33) and r2 a short item (length < 32). This means that a proof can be forged for r1 if r2 exists and viceversa. Although the attack is very unlikely since it apparently requires a branch node close to the leaf's key, it may justify that we expose a variant of verify called secureVerify that rehashes the key. This is what the SecureMerkleTrie library from optimism does

function secureVerify(
    bytes memory value,
    bytes32 root,
    bytes memory key,
    bytes[] memory proof
) internal pure returns (bool) {
    return verify(value, root, abi.encodePacked(keccak256(key)), proof);
}

As far as I understand, both the verify and secureVerify variants would be useful since the former would serve to verify against a transactionRoot or a receiptsRoot, but storageRoot and stateRoot use the secure variant (i.e. this is why we hash the storage address and the slot in the tests), so it seems worth keeping both.

@Amxx
Copy link
Collaborator

Amxx commented Dec 12, 2025

My understanding is that the choice of key is not ours to make. Its really the tree building process that decides what key is used, and the verifyer must follow the same mechanism otherwize they won't be able to generate a proof.

  • When verifying an account details (against the blocks's stateRoot), the key is the keccak256 hash of the account's address
  • When verifying a storage slot (against the account's storageHash), the key is the keccak256 hash of the slot
  • When verifying a transaction (against the blocks's transactionsRoot), the key is the index of the transaction in the block ?
  • When verifying an event (against the blocks's receiptsRoot), the key is the index of the event in the block ?

In all cases its the same function that is used, just with different input (that you prepare differently). I'm not sure I see the point in having wrappers that do that preparation. If the preparation is not done correctly, there should not be a proof, and the verification shoud always fail.

If having a secureVerify only does the preparation of the key, and nothing else, I don't think its any more (or less) secure than the "simple" verify. I'd rather not have it than create the impression that one is secure (and should be used) when the other is not.

@ernestognw
Copy link
Member Author

To be clear, I initially thought the SecureMerkleTrie from Optimism was deliberately rehashing the key in a similar fashion to how we prevent a second preimage attack in our merkle tree typescript library, so I considered relevant to have the "secure" variant.

However, I found some research suggesting that the keys are hashed for storage proofs and account proofs to avoid arbitrary disk I/O, so rehashing doesn't really concern to on-chain verification. Still, for transaction or events proof I understand RLP forgery is theoretically possible, just they keys are too low and sequential.

In all cases its the same function that is used, just with different input (that you prepare differently). I'm not sure I see the point in having wrappers that do that preparation.

Yeah I was trying to avoid users making a mistake rather than "preparing" the verification, but on a second thought I agree there's no point if there's no relevant security consideration.

If having a secureVerify only does the preparation of the key, and nothing else, I don't think its any more (or less) secure than the "simple" verify. I'd rather not have it than create the impression that one is secure (and should be used) when the other is not.

Having specific verify functions may be a good idea but I only see benefit in documenting how to obtain and process each type of proof. For that case I'd rather make a guide in the docs 🤔

@Amxx
Copy link
Collaborator

Amxx commented Dec 12, 2025

However, I found some research suggesting that the keys are hashed for storage proofs and account proofs to avoid arbitrary disk I/O, so rehashing doesn't really concern to on-chain verification.

For the transaction and receipt tries, the keys are just the index in the block (starting at 0). That makes for overall "shorters" (as in not too deep) trees. We know all values from 0 to N are populated, so there is no serious "unbalancing of the tree here"

For the storage (within account), the slots are controlled by the "user" (the compiler actually). Its easy to populate many consecutive slot at a pseudo-random location. That is what appens when you write a long string to storage. The hashing of the storage slot randomizes all that in the tree, making it more balanced.

For the state tree, I guess the logic is similar to the storage tree, though I don't really understand why. Addresses are hashes already, so I'm not sure what this is preventing. Many its for dealing with consecutive/close addresses that are the result of vanity generation, but an attacker could just as well use a vanity gen that makes the images of addresses "close" to try to unbalance the tree. Still doing the hash makes the state tree have a depth of 32 bytes instead of 20 bytes, which feels counterproductive. Maybe a coredev could explain us why its don't that way.

Having specific verify functions may be a good idea but I only see benefit in documenting how to obtain and process each type of proof. For that case I'd rather make a guide in the docs 🤔

We don't even know how to generate the proofs for transactions and events. AFAIK the easiest way might be to fetch the entier block (with all txs and events) and rebuild the trees from that. I feel like whoever is going to use this function to prove events likelly knows more about this process than we do, so having mid-documentation (as in documentation that is not perfect) is probably useless. I'd just point to the EIP that stnadardize that (or the yellow paper)

@socket-security
Copy link

socket-security bot commented Dec 12, 2025

Review the following changes in direct dependencies. Learn more about Socket for GitHub.

Diff Package Supply Chain
Security
Vulnerability Quality Maintenance License
Addednpm/​@​ethereumjs/​mpt@​10.1.0851001009270

View full report

Amxx and others added 2 commits December 12, 2025 17:45
Co-authored-by: Ernesto García <ernestognw@gmail.com>
ernestognw and others added 3 commits December 12, 2025 13:47
Co-authored-by: Hadrien Croubois <hadrien.croubois@gmail.com>
@ernestognw ernestognw merged commit 0e19973 into OpenZeppelin:master Dec 12, 2025
28 of 29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants