Thread &dyn Crypto through all keccak call sites in block validation#6365
Thread &dyn Crypto through all keccak call sites in block validation#6365
Conversation
…ion path ZisK zkVM team reported ethrex produces only 1,168 keccak precompile ops vs Reth's 28,228 for the same block, causing a 3.3x RISC-V cycle penalty. Root cause: only ~2 of ~86 keccak call sites went through the Crypto trait that zkVM teams can override. This change passes &dyn Crypto as a function parameter through every function in the keccak call chain — from trie node hashing (NodeHash, LeafNode, ExtensionNode, BranchNode), through Trie root computation, block validation (transactions/receipts/withdrawals roots, bloom, block hash), account hashing (code_hash, storage_root), to the guest program execution entry point (execute_blocks). Non-guest callers pass &NativeCrypto (zero-sized struct that the compiler can devirtualize and inline for zero overhead). The GuestProgramStateWrapper stores Arc<dyn Crypto + Send + Sync> to bridge the VmDatabase trait (which cannot take crypto params) with the underlying methods that now need crypto for hash_address/hash_key/get_valid_storage_trie.
1. Receipt::encode_inner_with_bloom hardcoded NativeCrypto for bloom computation. Every receipt's log addresses and topics were hashed with NativeCrypto during validate_receipts_root, bypassing the zkVM accelerator. Now takes crypto parameter. 2. GuestProgramState initialization (from ExecutionWitness) hashed all state trie nodes, storage trie nodes, and contract code with NativeCrypto. These are the most expensive keccak operations in block validation. Replaced TryFrom with from_witness(witness, crypto). 3. get_first_invalid_block_hash and get_block_hash called header.hash() which falls back to NativeCrypto. Now take crypto and use compute_block_hash(crypto) directly, removing the fragile ordering dependency on initialize_block_header_hashes.
🤖 Kimi Code ReviewReview SummaryThis PR introduces a significant architectural change by making cryptographic operations pluggable through a Key Observations
Minor Issues
Recommendations
Overall, this is a solid architectural improvement that maintains correctness while adding flexibility. The changes are extensive but appear to be implemented correctly throughout the codebase. Automated review by Kimi (Moonshot AI) · custom prompt |
🤖 Codex Code Review
Overall: most changes are consistent mechanical propagation, but the trie/hash-cache/provider interactions above are correctness-critical and should be fixed before merge. Automated review by OpenAI Codex · custom prompt |
Lines of code reportTotal lines added: Detailed view |
🤖 Claude Code ReviewNow I have a comprehensive view of the PR. Let me compile the review. PR Review: Thread
|
There was a problem hiding this comment.
Pull request overview
Threads &dyn Crypto through trie hashing and block/receipt root computation so zkVM targets can override keccak (avoiding native tiny_keccak fallbacks) while keeping native callers on NativeCrypto.
Changes:
- Adds
&dyn Cryptoparameters across trie hashing APIs (NodeHash/Node/Trie) and block validation/root helpers. - Updates guest execution to carry
Arc<dyn Crypto + Send + Sync>through witness/state initialization and validation. - Adjusts native callers (RPC/P2P/storage/CLI/LEVM) to pass
&NativeCryptoat updated call sites.
Reviewed changes
Copilot reviewed 41 out of 41 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| crates/vm/witness_db.rs | Store crypto in wrapper; pass into DB calls |
| crates/vm/levm/src/utils.rs | Use VM crypto for Code::from_bytecode |
| crates/vm/levm/src/execution_handlers.rs | Use VM crypto for contract code hashing |
| crates/vm/levm/runner/src/main.rs | Runner uses NativeCrypto for code hashing |
| crates/vm/levm/runner/src/input.rs | Runner input uses NativeCrypto for code hashing |
| crates/vm/backends/levm/mod.rs | Native backend uses NativeCrypto for code hashing |
| crates/storage/store.rs | Trie hashing now passes NativeCrypto |
| crates/networking/rpc/types/receipt.rs | RPC bloom uses NativeCrypto |
| crates/networking/rpc/types/payload.rs | RPC payload roots use NativeCrypto |
| crates/networking/rpc/eth/block.rs | Receipt encoding uses NativeCrypto |
| crates/networking/p2p/sync/healing/storage.rs | Trie node hash checks use NativeCrypto |
| crates/networking/p2p/sync/healing/state.rs | Trie node hash checks use NativeCrypto |
| crates/networking/p2p/snap/client.rs | Snap node hash checks use NativeCrypto |
| crates/networking/p2p/peer_handler.rs | Block body validation accepts crypto param |
| crates/l2/utils/state_reconstruct.rs | State root computation passes NativeCrypto |
| crates/l2/sequencer/l1_committer.rs | State root computation passes NativeCrypto |
| crates/guest-program/src/l2/program.rs | Thread crypto arc into execution entry |
| crates/guest-program/src/l1/program.rs | Thread crypto arc into execution entry |
| crates/guest-program/src/common/execution.rs | execute_blocks now carries crypto arc |
| crates/common/validation.rs | Receipts root validation takes crypto param |
| crates/common/types/transaction.rs | Update tests to pass NativeCrypto |
| crates/common/types/receipt.rs | Bloom/receipt encoding threaded with crypto |
| crates/common/types/genesis.rs | Genesis roots computed with NativeCrypto |
| crates/common/types/block_execution_witness.rs | Witness→state init + hashing now takes crypto |
| crates/common/types/block.rs | Roots + header hash compute now takes crypto |
| crates/common/types/account.rs | Code/storage hashing now takes crypto |
| crates/common/trie/verify_range.rs | Trie hash calls now pass NativeCrypto |
| crates/common/trie/trie_sorted.rs | Trie-sorted hashing passes NativeCrypto |
| crates/common/trie/trie.rs | Trie commit/hash APIs require crypto param |
| crates/common/trie/rlp.rs | RLP encoding consults hashes via NativeCrypto |
| crates/common/trie/node_hash.rs | NodeHash hashing/finalize now takes crypto |
| crates/common/trie/node/leaf.rs | Leaf hashing now takes crypto (tests updated) |
| crates/common/trie/node/extension.rs | Extension hashing now takes crypto (tests updated) |
| crates/common/trie/node/branch.rs | Branch hashing now takes crypto (tests updated) |
| crates/common/trie/node.rs | NodeRef/Node hashing & commit now takes crypto |
| crates/common/trie/logger.rs | Logger hashes via NativeCrypto |
| crates/common/trie/db.rs | DB embedding commit now passes NativeCrypto |
| crates/common/common.rs | Re-export NativeCrypto from common crate |
| crates/blockchain/payload.rs | Payload roots/bloom computed with NativeCrypto |
| crates/blockchain/blockchain.rs | Receipt root validation + trie hashing uses NativeCrypto |
| cmd/ethrex/cli.rs | CLI import validation passes NativeCrypto |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| pub fn hash(&mut self, crypto: &dyn Crypto) -> Result<H256, TrieError> { | ||
| self.commit(crypto)?; | ||
| Ok(self.hash_no_commit(crypto)) |
| pub fn apply_account_updates( | ||
| &mut self, | ||
| account_updates: &[AccountUpdate], | ||
| crypto: &dyn Crypto, | ||
| ) -> Result<(), GuestProgramStateError> { | ||
| self.lock_mutex()?.apply_account_updates(account_updates) | ||
| self.lock_mutex()? | ||
| .apply_account_updates(account_updates, crypto) | ||
| } | ||
|
|
||
| pub fn state_trie_root(&self) -> Result<H256, GuestProgramStateError> { | ||
| self.lock_mutex()?.state_trie_root() | ||
| pub fn state_trie_root(&self, crypto: &dyn Crypto) -> Result<H256, GuestProgramStateError> { | ||
| self.lock_mutex()?.state_trie_root(crypto) | ||
| } | ||
|
|
||
| pub fn get_first_invalid_block_hash(&self) -> Result<Option<u64>, GuestProgramStateError> { | ||
| self.lock_mutex()?.get_first_invalid_block_hash() | ||
| pub fn get_first_invalid_block_hash( | ||
| &self, | ||
| crypto: &dyn Crypto, | ||
| ) -> Result<Option<u64>, GuestProgramStateError> { | ||
| self.lock_mutex()?.get_first_invalid_block_hash(crypto) |
Greptile SummaryThis PR addresses a significant zkVM performance regression: only ~2 of ~86 keccak call sites were routed through the Key changes and points to note:
Confidence Score: 4/5
|
| Filename | Overview |
|---|---|
| crates/guest-program/src/common/execution.rs | Core guest execution entry point — threads Arc<dyn Crypto + Send + Sync> through the full execution pipeline; all crypto calls consistently use the same Arc instance. |
| crates/vm/witness_db.rs | GuestProgramStateWrapper stores self.crypto (Arc) for VmDatabase trait methods, but also exposes separate explicit crypto: &dyn Crypto params on apply_account_updates, state_trie_root, get_first_invalid_block_hash, and initialize_block_header_hashes — creating a dual-path design that is redundant but consistent in practice. |
| crates/common/trie/node.rs | Most methods correctly thread &dyn Crypto, but get_node_checked (hash-verification on DB load) still hardcodes NativeCrypto, caching the hash without going through the injected circuit accelerator. |
| crates/common/trie/rlp.rs | BranchNode and ExtensionNode RLP encoding hardcodes NativeCrypto for child hash refs. Safe in the normal commit() path (children already memoized), but would bypass the injected crypto if encode() is called before commit(). |
| crates/common/types/block_execution_witness.rs | Converts from TryFrom impl to GuestProgramState::from_witness(value, crypto); all hash operations (address hashing, code hashing, storage trie hashing) correctly use the injected crypto. |
| crates/common/trie/node_hash.rs | from_encoded and finalize now take &dyn Crypto; correctly delegates all keccak operations to the injected trait. |
| crates/common/trie/trie.rs | hash, hash_no_commit, commit, collect_changes_since_last_hash, and compute_hash_from_unsorted_iter all now thread &dyn Crypto; residual NativeCrypto uses are in error-message paths where the hash is already cached. |
| crates/common/types/block.rs | compute_transactions_root, compute_receipts_root, compute_withdrawals_root, validate_block_body, and BlockHeader::compute_block_hash now accept &dyn Crypto; BlockHeader::hash() still memoizes with NativeCrypto (intentional safe path). |
| crates/blockchain/blockchain.rs | All trie commit/hash calls updated to pass &NativeCrypto consistently; no zkVM usage here so NativeCrypto is correct. |
Sequence Diagram
sequenceDiagram
participant EP as execute_blocks<br/>(guest-program)
participant GW as GuestProgramStateWrapper
participant GPS as GuestProgramState
participant Trie as Trie / NodeRef
participant NH as NodeHash
participant C as &dyn Crypto<br/>(zkVM circuit)
EP->>GPS: from_witness(witness, crypto)
GPS->>Trie: hash_no_commit(crypto)
Trie->>Trie: memoize_hashes(buf, crypto)
Trie->>NH: from_encoded(buf, crypto)
NH->>C: keccak256(encoded_node)
C-->>NH: hash[32]
NH-->>Trie: NodeHash (cached in OnceLock)
EP->>GW: new(state, crypto.clone())
EP->>GW: initialize_block_header_hashes(blocks, crypto)
EP->>GW: validate_block_body(..., crypto)
EP->>GW: apply_account_updates(updates, crypto)
GW->>GPS: apply_account_updates(updates, crypto)
GPS->>C: keccak256(address) — hash_address
GPS->>C: keccak256(key) — hash_key
GPS->>Trie: hash_no_commit(crypto)
Trie->>C: keccak256(encoded_node)
EP->>GW: state_trie_root(crypto)
GW->>GPS: state_trie_root(crypto)
GPS->>Trie: hash_no_commit(crypto)
Trie-->>EP: H256 (final state root)
Prompt To Fix All With AI
This is a comment left during a code review.
Path: crates/common/trie/node.rs
Line: 87-88
Comment:
**`get_node_checked` still uses `NativeCrypto` — missed keccak call site**
`get_node_checked` verifies a loaded node's hash with `NativeCrypto` before returning it. Two consequences:
1. **Performance**: if this branch is hit during zkVM execution (e.g. an incomplete witness that needs a DB lookup), the keccak is computed outside the zkVM's circuit accelerator, defeating the purpose of this PR for that node.
2. **Cache poisoning**: `node.compute_hash(&NativeCrypto)` calls `memoize_hashes` internally and writes the result into the node's `OnceLock`. If the same `NodeRef::Node` is later `commit()`-ed with the injected crypto, the `OnceLock` is already set (from the `NativeCrypto` path) and the injected crypto is never used for that node.
Both crypto implementations produce the same Keccak-256 output, so there is **no correctness bug** in normal operation. But if a properly-formed witness is incomplete (any hash-referenced node must be fetched), the zkVM won't count those hashes against its circuit, under-reporting keccak usage — the same root cause the PR was written to fix.
Consider threading `crypto: &dyn Crypto` into `get_node_checked` (mirroring the treatment of `commit` / `compute_hash_no_alloc`), or adding a comment explaining why native keccak is intentional here (e.g. "only reachable outside the guest-program path").
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: crates/vm/witness_db.rs
Line: 32-68
Comment:
**Redundant `crypto` parameters alongside stored `self.crypto`**
`GuestProgramStateWrapper` stores `self.crypto: Arc<dyn Crypto + Send + Sync>` so that `VmDatabase` trait methods (`get_account_state`, `get_block_hash`, `get_storage_slot`) can use it without receiving it as a parameter. But the four wrapper-specific methods below also accept an explicit `crypto: &dyn Crypto` parameter and just forward it straight through:
```rust
pub fn apply_account_updates(&mut self, ..., crypto: &dyn Crypto) { ... }
pub fn state_trie_root(&self, crypto: &dyn Crypto) { ... }
pub fn get_first_invalid_block_hash(&self, crypto: &dyn Crypto) { ... }
pub fn initialize_block_header_hashes(&self, ..., crypto: &dyn Crypto) { ... }
```
In `execution.rs` every caller passes `crypto.as_ref()` from the same `Arc` that was stored as `self.crypto`, so there is no functional difference. However, the split API creates two risks:
1. A future caller could pass a **different** `crypto` to `apply_account_updates` than the one stored in `self.crypto`. The `account_hashes_by_address` cache inside `GuestProgramState` would then contain entries keyed by one crypto's hash but later queried via the `VmDatabase` path (which always uses `self.crypto`). Since all correct Keccak-256 implementations produce the same output this does not cause a hash mismatch today, but it is a fragile invariant.
2. The call sites in `execution.rs` are harder to read — the reader has to verify that `crypto.as_ref()` is indeed the same object as `self.crypto`.
Consider replacing the explicit parameters with `self.crypto.as_ref()` inside the wrapper methods, removing the ambiguity entirely.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: crates/common/trie/rlp.rs
Line: 18-67
Comment:
**`NativeCrypto` in RLP encoding is safe in the `commit()` path but undocumented edge-case risk**
The PR description correctly notes this is "safe due to memoization": by the time `BranchNode::encode` / `ExtensionNode::encode` run inside `NodeRef::commit()`, all children have already been committed recursively and their hashes are cached in their `OnceLock` fields. The `compute_hash_ref(&NativeCrypto)` calls therefore return cached values and never actually invoke keccak.
However, the same `RLPEncode` impl is also called via `Node::encode_to_vec()` (used directly from `put_batch_no_alloc` in `db.rs`), and potentially from other non-`commit` paths. If `encode()` is invoked on a branch or extension node whose children have **not** yet been committed/memoized, `NativeCrypto::keccak256` is called, the result is stored in the child's `OnceLock`, and any subsequent `commit(injected_crypto)` will find the lock already set and skip the injected crypto for that child.
A defensive guard and/or a `// Safety: children are always memoized before this method is called from commit()` comment would make the invariant explicit and prevent accidental misuse.
How can I resolve this? If you propose a fix, please make it concise.Last reviewed commit: c4b8503
| Ok(node) => (node.compute_hash(&NativeCrypto) == *hash) | ||
| .then_some(Ok(Arc::new(node))), |
There was a problem hiding this comment.
get_node_checked still uses NativeCrypto — missed keccak call site
get_node_checked verifies a loaded node's hash with NativeCrypto before returning it. Two consequences:
- Performance: if this branch is hit during zkVM execution (e.g. an incomplete witness that needs a DB lookup), the keccak is computed outside the zkVM's circuit accelerator, defeating the purpose of this PR for that node.
- Cache poisoning:
node.compute_hash(&NativeCrypto)callsmemoize_hashesinternally and writes the result into the node'sOnceLock. If the sameNodeRef::Nodeis latercommit()-ed with the injected crypto, theOnceLockis already set (from theNativeCryptopath) and the injected crypto is never used for that node.
Both crypto implementations produce the same Keccak-256 output, so there is no correctness bug in normal operation. But if a properly-formed witness is incomplete (any hash-referenced node must be fetched), the zkVM won't count those hashes against its circuit, under-reporting keccak usage — the same root cause the PR was written to fix.
Consider threading crypto: &dyn Crypto into get_node_checked (mirroring the treatment of commit / compute_hash_no_alloc), or adding a comment explaining why native keccak is intentional here (e.g. "only reachable outside the guest-program path").
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/common/trie/node.rs
Line: 87-88
Comment:
**`get_node_checked` still uses `NativeCrypto` — missed keccak call site**
`get_node_checked` verifies a loaded node's hash with `NativeCrypto` before returning it. Two consequences:
1. **Performance**: if this branch is hit during zkVM execution (e.g. an incomplete witness that needs a DB lookup), the keccak is computed outside the zkVM's circuit accelerator, defeating the purpose of this PR for that node.
2. **Cache poisoning**: `node.compute_hash(&NativeCrypto)` calls `memoize_hashes` internally and writes the result into the node's `OnceLock`. If the same `NodeRef::Node` is later `commit()`-ed with the injected crypto, the `OnceLock` is already set (from the `NativeCrypto` path) and the injected crypto is never used for that node.
Both crypto implementations produce the same Keccak-256 output, so there is **no correctness bug** in normal operation. But if a properly-formed witness is incomplete (any hash-referenced node must be fetched), the zkVM won't count those hashes against its circuit, under-reporting keccak usage — the same root cause the PR was written to fix.
Consider threading `crypto: &dyn Crypto` into `get_node_checked` (mirroring the treatment of `commit` / `compute_hash_no_alloc`), or adding a comment explaining why native keccak is intentional here (e.g. "only reachable outside the guest-program path").
How can I resolve this? If you propose a fix, please make it concise.| @@ -53,8 +61,10 @@ impl GuestProgramStateWrapper { | |||
| pub fn initialize_block_header_hashes( | |||
| &self, | |||
| blocks: &[Block], | |||
| crypto: &dyn Crypto, | |||
| ) -> Result<(), GuestProgramStateError> { | |||
| self.lock_mutex()?.initialize_block_header_hashes(blocks) | |||
| self.lock_mutex()? | |||
| .initialize_block_header_hashes(blocks, crypto) | |||
| } | |||
There was a problem hiding this comment.
Redundant crypto parameters alongside stored self.crypto
GuestProgramStateWrapper stores self.crypto: Arc<dyn Crypto + Send + Sync> so that VmDatabase trait methods (get_account_state, get_block_hash, get_storage_slot) can use it without receiving it as a parameter. But the four wrapper-specific methods below also accept an explicit crypto: &dyn Crypto parameter and just forward it straight through:
pub fn apply_account_updates(&mut self, ..., crypto: &dyn Crypto) { ... }
pub fn state_trie_root(&self, crypto: &dyn Crypto) { ... }
pub fn get_first_invalid_block_hash(&self, crypto: &dyn Crypto) { ... }
pub fn initialize_block_header_hashes(&self, ..., crypto: &dyn Crypto) { ... }In execution.rs every caller passes crypto.as_ref() from the same Arc that was stored as self.crypto, so there is no functional difference. However, the split API creates two risks:
- A future caller could pass a different
cryptotoapply_account_updatesthan the one stored inself.crypto. Theaccount_hashes_by_addresscache insideGuestProgramStatewould then contain entries keyed by one crypto's hash but later queried via theVmDatabasepath (which always usesself.crypto). Since all correct Keccak-256 implementations produce the same output this does not cause a hash mismatch today, but it is a fragile invariant. - The call sites in
execution.rsare harder to read — the reader has to verify thatcrypto.as_ref()is indeed the same object asself.crypto.
Consider replacing the explicit parameters with self.crypto.as_ref() inside the wrapper methods, removing the ambiguity entirely.
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/vm/witness_db.rs
Line: 32-68
Comment:
**Redundant `crypto` parameters alongside stored `self.crypto`**
`GuestProgramStateWrapper` stores `self.crypto: Arc<dyn Crypto + Send + Sync>` so that `VmDatabase` trait methods (`get_account_state`, `get_block_hash`, `get_storage_slot`) can use it without receiving it as a parameter. But the four wrapper-specific methods below also accept an explicit `crypto: &dyn Crypto` parameter and just forward it straight through:
```rust
pub fn apply_account_updates(&mut self, ..., crypto: &dyn Crypto) { ... }
pub fn state_trie_root(&self, crypto: &dyn Crypto) { ... }
pub fn get_first_invalid_block_hash(&self, crypto: &dyn Crypto) { ... }
pub fn initialize_block_header_hashes(&self, ..., crypto: &dyn Crypto) { ... }
```
In `execution.rs` every caller passes `crypto.as_ref()` from the same `Arc` that was stored as `self.crypto`, so there is no functional difference. However, the split API creates two risks:
1. A future caller could pass a **different** `crypto` to `apply_account_updates` than the one stored in `self.crypto`. The `account_hashes_by_address` cache inside `GuestProgramState` would then contain entries keyed by one crypto's hash but later queried via the `VmDatabase` path (which always uses `self.crypto`). Since all correct Keccak-256 implementations produce the same output this does not cause a hash mismatch today, but it is a fragile invariant.
2. The call sites in `execution.rs` are harder to read — the reader has to verify that `crypto.as_ref()` is indeed the same object as `self.crypto`.
Consider replacing the explicit parameters with `self.crypto.as_ref()` inside the wrapper methods, removing the ambiguity entirely.
How can I resolve this? If you propose a fix, please make it concise.| @@ -35,15 +37,15 @@ impl RLPEncode for BranchNode { | |||
| fn encode_to_vec(&self) -> Vec<u8> { | |||
| let value_len = <[u8] as RLPEncode>::length(&self.value); | |||
| let choices_len = self.choices.iter().fold(0, |acc, child| { | |||
| acc + RLPEncode::length(child.compute_hash_ref()) | |||
| acc + RLPEncode::length(child.compute_hash_ref(&NativeCrypto)) | |||
| }); | |||
| let payload_len = choices_len + value_len; | |||
|
|
|||
| let mut buf: Vec<u8> = Vec::with_capacity(payload_len + 3); // 3 byte prefix headroom | |||
|
|
|||
| encode_length(payload_len, &mut buf); | |||
| for child in self.choices.iter() { | |||
| match child.compute_hash_ref() { | |||
| match child.compute_hash_ref(&NativeCrypto) { | |||
| NodeHash::Hashed(hash) => hash.0.encode(&mut buf), | |||
| NodeHash::Inline((_, 0)) => buf.push(RLP_NULL), | |||
| NodeHash::Inline((encoded, len)) => { | |||
| @@ -60,7 +62,7 @@ impl RLPEncode for BranchNode { | |||
| impl RLPEncode for ExtensionNode { | |||
| fn encode(&self, buf: &mut dyn bytes::BufMut) { | |||
| let mut encoder = Encoder::new(buf).encode_bytes(&self.prefix.encode_compact()); | |||
| encoder = self.child.compute_hash().encode(encoder); | |||
| encoder = self.child.compute_hash(&NativeCrypto).encode(encoder); | |||
| encoder.finish(); | |||
| } | |||
There was a problem hiding this comment.
NativeCrypto in RLP encoding is safe in the commit() path but undocumented edge-case risk
The PR description correctly notes this is "safe due to memoization": by the time BranchNode::encode / ExtensionNode::encode run inside NodeRef::commit(), all children have already been committed recursively and their hashes are cached in their OnceLock fields. The compute_hash_ref(&NativeCrypto) calls therefore return cached values and never actually invoke keccak.
However, the same RLPEncode impl is also called via Node::encode_to_vec() (used directly from put_batch_no_alloc in db.rs), and potentially from other non-commit paths. If encode() is invoked on a branch or extension node whose children have not yet been committed/memoized, NativeCrypto::keccak256 is called, the result is stored in the child's OnceLock, and any subsequent commit(injected_crypto) will find the lock already set and skip the injected crypto for that child.
A defensive guard and/or a // Safety: children are always memoized before this method is called from commit() comment would make the invariant explicit and prevent accidental misuse.
Prompt To Fix With AI
This is a comment left during a code review.
Path: crates/common/trie/rlp.rs
Line: 18-67
Comment:
**`NativeCrypto` in RLP encoding is safe in the `commit()` path but undocumented edge-case risk**
The PR description correctly notes this is "safe due to memoization": by the time `BranchNode::encode` / `ExtensionNode::encode` run inside `NodeRef::commit()`, all children have already been committed recursively and their hashes are cached in their `OnceLock` fields. The `compute_hash_ref(&NativeCrypto)` calls therefore return cached values and never actually invoke keccak.
However, the same `RLPEncode` impl is also called via `Node::encode_to_vec()` (used directly from `put_batch_no_alloc` in `db.rs`), and potentially from other non-`commit` paths. If `encode()` is invoked on a branch or extension node whose children have **not** yet been committed/memoized, `NativeCrypto::keccak256` is called, the result is stored in the child's `OnceLock`, and any subsequent `commit(injected_crypto)` will find the lock already set and skip the injected crypto for that child.
A defensive guard and/or a `// Safety: children are always memoized before this method is called from commit()` comment would make the invariant explicit and prevent accidental misuse.
How can I resolve this? If you propose a fix, please make it concise.
Motivation
ZisK zkVM team reported ethrex produces only 1,168 keccak precompile ops vs Reth's 28,228 for the same block (block 24640819), causing a 3.3x RISC-V cycle penalty (286M vs 87M cycles). Root cause: only ~2 of ~86 keccak call sites went through the
Cryptotrait that zkVM teams can override — the rest calledkeccak_hash()directly, which on RISC-V falls back totiny_keccakinstead of using the zkVM's keccak circuit accelerator.Description
Passes
&dyn Cryptoas a function parameter through every function in the keccak call chain, bottom-up from leaf functions:NodeHash::from_encoded,NodeHash::finalize,{Leaf,Extension,Branch}Node::compute_hash_no_alloc,NodeRef::commit,Trie::hash_no_commit,Trie::compute_hash_from_unsorted_iter,Trie::collect_changes_since_last_hashcompute_transactions_root,compute_receipts_root,compute_withdrawals_root,validate_block_body,BlockHeader::compute_block_hash,bloom_from_logs,Receipt::encode_inner_with_bloomCode::from_bytecode,code_hash,compute_storage_roothash_address,hash_key,apply_account_updates,state_trie_root,initialize_block_header_hashes,get_first_invalid_block_hash,get_block_hash,get_account_state,get_storage_slot,get_valid_storage_trie,GuestProgramState::from_witnessexecute_blocksreceivesArc<dyn Crypto + Send + Sync>and threads it through all validation, trie, and state operationsNon-guest callers (blockchain, storage, P2P, RPC, CLI) pass
&NativeCrypto— a zero-sized struct the compiler devirtualizes to zero overhead.GuestProgramStateWrapperstoresArc<dyn Crypto + Send + Sync>to bridge theVmDatabasetrait (which can't take crypto params) with underlying methods that need crypto forhash_address/hash_key/get_valid_storage_trie/get_block_hash/get_account_state/get_storage_slot.How to Test
cargo check --workspacepassescargo test --package ethrex-common --package ethrex-trie— all 112 tests passkeccak_hash(incrates/guest-program/— only ECRECOVER crypto internals remain (expected, these are inside theCryptotrait impl itself)NativeCryptoincrates/common/andcrates/vm/levm/— only in error formatting paths, tests, RLP encoding (safe due to memoization), and native-only code (genesis, trie_sorted, verify_range, logger)Closes
N/A — addresses ZisK team feedback on keccak precompile utilization