Skip to content

perf(l1): decoded TrieLayerCache: skip Node::decode() on cache hits#6348

Draft
Arkenan wants to merge 6 commits intomainfrom
decoded_trie_layer_cache
Draft

perf(l1): decoded TrieLayerCache: skip Node::decode() on cache hits#6348
Arkenan wants to merge 6 commits intomainfrom
decoded_trie_layer_cache

Conversation

@Arkenan
Copy link
Collaborator

@Arkenan Arkenan commented Mar 9, 2026

Decoded TrieLayerCache: skip Node::decode() on cache hits

Summary

  • Store decoded Arc<Node> in the trie layer cache instead of RLP-encoded bytes, eliminating ~100ns Node::decode() per cache hit (~200K lookups/block)
  • Split the single flat TrieLayer map into separate account_nodes, storage_nodes, and leaf_values maps with a new CachedTrieEntry { node: Arc<Node>, encoded: Vec<u8> } type
  • Add get_node() method to the TrieDB trait (default no-op) so NodeRef::get_node/get_node_mut/get_node_checked can retrieve decoded nodes directly from cache before falling back to get() + decode
  • Introduce TrieCommitEntry enum to carry both decoded Arc<Node> and RLP encoding through the commit pipeline, replacing flat (Nibbles, Vec<u8>) tuples

Motivation

The TrieLayerCache was the only cache in ethrex storing RLP-encoded bytes — every other cache (CodeCache, GeneralizedDatabase, etc.) stores decoded Rust structs. On every cache hit, the caller still had to call Node::decode(&rlp), wasting CPU on deserialization of data we just serialized a few blocks ago. Additionally, account and
storage trie nodes were mixed in a single FxHashMap using a nibble-prefix hack with an invalid separator byte (0x11).

Design

TrieCommitEntry — A structured enum produced by NodeRef::commit():

  • Node { path, node: Arc<Node>, encoded: Vec<u8> } — trie nodes with both forms
  • LeafValue { path, value: Vec<u8> } — FKV leaf values (application-level data, not decoded at trie level)

This flows through collect_changes_since_last_hash()AccountUpdatesListTrieLayerCache::put_batch(). The into_rlp_pair() method provides backward-compatible (Nibbles, Vec<u8>) conversion for DB writes and Trie::commit().

TrieDB::get_node() — New trait method with default Ok(None). TrieWrapper implements it to return Arc<Node> from the layer cache. NodeRef::get_node/get_node_mut/get_node_checked try this first, falling back to the existing get() + Node::decode() path only on cache miss. BackendTrieDB and InMemoryTrieDB inherit the
default (always go to DB).

Split TrieLayer — Three separate maps instead of one:

  • account_nodes: FxHashMap<Vec<u8>, CachedTrieEntry> — unprefixed nibble paths
  • storage_nodes: FxHashMap<Vec<u8>, CachedTrieEntry> — prefixed nibble paths (existing format)
  • leaf_values: FxHashMap<Vec<u8>, Vec<u8>> — FKV entries (both account and storage)

The single bloom filter is retained across all three maps. commit() recombines entries into the same flat Vec<(Vec<u8>, Vec<u8>)> format so the disk dispatch logic in apply_trie_updates is unchanged.

put_batch() signature change — Takes separate account_entries and storage_entries instead of a pre-flattened Vec<(Nibbles, Vec<u8>)>. The prefix application for storage entries moves inside the cache, simplifying apply_trie_updates.

@github-actions
Copy link

github-actions bot commented Mar 9, 2026

🤖 Kimi Code Review

Review Summary

This PR introduces TrieCommitEntry to replace the raw (Nibbles, Vec<u8>) tuple used for trie updates, separating trie nodes from leaf values and enabling caching of decoded nodes. The changes are extensive but generally well-structured.

Critical Issues

  1. Potential panic in collect_trie (blockchain.rs:2806): The retain call uses entry.path() but TrieCommitEntry::Node variant doesn't expose the path through a method. This will panic at runtime.

  2. Storage node prefix handling inconsistency: In layering.rs, storage nodes are prefixed with account_nibbles ++ 0x11 ++ path, but the prefix construction in apply_prefix uses account_nibbles ++ path without the 0x11 separator. This mismatch will cause cache misses.

  3. Race condition in get_node implementations: The new get_node method in TrieLogger and TrieWrapper doesn't properly handle concurrent access to the witness lock or cache updates.

Security Concerns

  1. Hash verification bypass in get_node: The cached node path in get_node implementations bypasses hash verification that exists in the non-cached path, potentially allowing stale or malicious cached data.

  2. State cycle detection: The panic on state cycles (layering.rs:120, 143, 166) could be triggered by malformed data, causing DoS.

Performance & Correctness

  1. Memory leak in TrieLayerCache: The CachedTrieEntry stores both Arc<Node> and Vec<u8> encoded data, potentially doubling memory usage for large nodes.

  2. Inefficient bloom filter usage: The bloom filter is checked three separate times for account nodes, storage nodes, and leaf values instead of once per key.

  3. Missing cache invalidation: No mechanism to invalidate cached nodes when underlying data changes.

Code Quality Issues

  1. Redundant path extraction: Multiple places convert Nibbles to Vec<u8> and back, which could be optimized.

  2. Incomplete error handling: Several unwrap_or_clone calls could panic if Arc is still referenced elsewhere.

  3. Naming inconsistency: TrieCommitEntry vs TrieNode naming is confusing - consider TrieUpdateEntry or similar.

Specific Recommendations

  1. Fix the collect_trie panic by properly accessing the path:

    nodes.retain(|entry| match entry {
        TrieCommitEntry::Node { path, .. } => path.as_ref().first() == Some(&index),
        TrieCommitEntry::LeafValue { path, .. } => path.as_ref().first() == Some(&index),
    });
  2. Add proper prefix handling for storage nodes in apply_prefix or adjust the cache key construction.

  3. Implement proper synchronization for cache access in get_node methods.

  4. Consider adding a validate_hash parameter to get_node to optionally verify cached node hashes.

  5. Add metrics for cache hit/miss rates to monitor performance impact.

The PR introduces valuable caching improvements but needs attention to the critical issues above before merging.


Automated review by Kimi (Moonshot AI) · custom prompt

@github-actions
Copy link

github-actions bot commented Mar 9, 2026

Lines of code report

Total lines added: 210
Total lines removed: 0
Total lines changed: 210

Detailed view
+------------------------------------------------------+-------+------+
| File                                                 | Lines | Diff |
+------------------------------------------------------+-------+------+
| ethrex/crates/blockchain/blockchain.rs               | 2226  | +15  |
+------------------------------------------------------+-------+------+
| ethrex/crates/common/trie/db.rs                      | 101   | +6   |
+------------------------------------------------------+-------+------+
| ethrex/crates/common/trie/logger.rs                  | 53    | +8   |
+------------------------------------------------------+-------+------+
| ethrex/crates/common/trie/node.rs                    | 371   | +19  |
+------------------------------------------------------+-------+------+
| ethrex/crates/common/trie/trie.rs                    | 564   | +34  |
+------------------------------------------------------+-------+------+
| ethrex/crates/networking/p2p/sync/healing/storage.rs | 635   | +8   |
+------------------------------------------------------+-------+------+
| ethrex/crates/storage/layering.rs                    | 319   | +118 |
+------------------------------------------------------+-------+------+
| ethrex/crates/storage/store.rs                       | 2556  | +2   |
+------------------------------------------------------+-------+------+

@github-actions
Copy link

github-actions bot commented Mar 9, 2026

🤖 Codex Code Review

  1. High: get_node_checked no longer validates hash when cache hits

    • In crates/common/trie/node.rs:87 and crates/common/trie/node.rs:89, NodeRef::get_node_checked now returns db.get_node(...) directly with the comment “cache contents are trusted”.
    • This weakens the contract of a checked read: a stale/wrong cached node at the same path will be accepted without compute_hash() == expected_hash.
    • In consensus-critical trie traversal/proof paths, this can mask corruption and produce invalid proofs/reads.
    • Suggested fix: keep the cache fast path, but verify hash before returning:
      • if hash matches: return node
      • else: return Ok(None) (or error) and optionally fall back to raw DB path.
  2. Medium: memory amplification risk in layer cache

I could not run cargo check in this environment because rustup attempted a toolchain sync and failed with filesystem permissions in /home/runner/.rustup/tmp, so runtime validation is still pending.


Automated review by OpenAI Codex · custom prompt

@github-actions
Copy link

github-actions bot commented Mar 9, 2026

🤖 Claude Code Review

Now I have enough context to write a thorough review. Here's my analysis:


Code Review: Decoded TrieLayerCache — skip Node::decode() on cache hits

The overall approach is sound and well-motivated. Storing decoded Arc<Node> in the layer cache avoids repeated RLP decoding on cache hits, and splitting the flat map into typed sub-maps is a meaningful cleanup. Below are the issues worth addressing.


Correctness

1. get_node_checked silently drops hash verification on cache hits

node.rs:87-100get_node_checked exists specifically to verify the fetched node matches the NodeHash in the NodeRef. The new code bypasses this check for cache hits:

NodeRef::Hash(hash @ NodeHash::Hashed(_)) => {
    // Try decoded cache first (cache contents are trusted)
    if let Some(node) = db.get_node(path.clone())? {
        return Ok(Some(node));   // <-- hash never verified
    }
    ...
    .and_then(|rlp| match Node::decode(&rlp) {
        Ok(node) => (node.compute_hash() == *hash).then_some(Ok(Arc::new(node))),

For nodes produced locally during block execution this is low risk, but nodes arriving via snapsync/healing (storage.rs:444-449) are inserted directly into TrieCommitEntry::Node from peer-supplied data. If a cache entry is ever stale or corrupted (e.g., a bug elsewhere writes a node under the wrong path), get_node_checked would return the wrong node silently. The hash check is cheap (compute_hash only recomputes if OnceLock is empty) and provides meaningful defense-in-depth. Consider adding it back:

if let Some(node) = db.get_node(path.clone())? {
    if node.compute_hash() == *hash {
        return Ok(Some(node));
    }
    // cache has wrong node for this hash — fall through to DB
}

2. TrieWrapper::get() no longer covers cached node raw bytes

layering.rs:415-425 — The fallback path in NodeRef::get_node (for Hash(_) when get_node() returns None) calls db.get(path). TrieWrapper::get() now only checks leaf_values, not account_nodes/storage_nodes. Previously the single flat nodes map covered everything. If get_node() returns None for a node that is present in account_nodes (e.g., a future bloom filter refactor or a call site that doesn't go through get_node), the fallback silently skips the in-memory encoded bytes and reads from disk. This is a behavioral narrowing of the fallback contract that isn't documented, and it means the two paths (get_node and get) are no longer symmetric for trie nodes.


Memory

3. CachedTrieEntry stores both decoded and encoded form — doubles per-entry memory

layering.rs:472-475:

pub struct CachedTrieEntry {
    pub node: Arc<Node>,
    pub encoded: Vec<u8>,
}

The encoded field is only consumed by commit() when a layer is flushed to disk (every ~128 blocks for regular sync). It's wasted memory for all the layers in front of the commit point. At ~200K nodes/block and 128-layer commit threshold, the encoded bytes could represent meaningful working-set pressure. Lazy re-encoding at commit time (calling node.encode() once during commit()) would halve per-entry allocation at the cost of one encoding pass per 128-block flush — a favorable trade.


Code cleanliness

4. TrieNode type alias not removed

trie.rs:53:

pub type TrieNode = (Nibbles, NodeRLP);

This is still publicly exported alongside TrieCommitEntry. The PR description says TrieNode is being replaced, but the alias remains. If it's no longer used externally, remove it. If it's kept for transitional reasons, add a #[deprecated] annotation or a TODO comment.

5. Minor: path.clone() before cache lookup

node.rs:63, 89, 123path.clone() is called before db.get_node(path.clone()), then the original path is also passed to db.get(path) in the fallback. If Nibbles::clone is cheap (e.g., inline SSO), this is fine, but if not it's avoidable by restructuring so the clone only happens on the hit path. At minimum the pattern is slightly surprising.


Clarification on the automated review

The Kimi bot raised two "critical" findings that are incorrect:

  • "retain call will panic"TrieCommitEntry::path() is properly implemented for both variants (trie.rs:74-80), so entry.path() compiles and works correctly.
  • "prefix mismatch in apply_prefix"prefix_nibbles is constructed in TrieWrapper::new as Nibbles::from_bytes(p.as_bytes()).append_new(17) (layering.rs:368), which already includes the 0x11 separator. Thus prefix.concat(&key) produces the same key format as apply_prefix(Some(account_hash), path). There is no mismatch.

Summary

The change is well-structured and the motivation (eliminating ~100ns decode per hit across ~200K lookups/block) is compelling. The main things to address before merging:

  • Restore hash verification in get_node_checked for cache hits (Item 1 — defense-in-depth, especially for snapsync paths)
  • Document or fix the asymmetry in the get()/get_node() fallback contract (Item 2)
  • Consider lazy re-encoding in commit() to reduce memory pressure (Item 3)
  • Clean up or deprecate the TrieNode alias (Item 4)

Automated review by Claude (Anthropic) · custom prompt

@Arkenan Arkenan changed the title Decoded TrieLayerCache: skip Node::decode() on cache hits perf(l1): decoded TrieLayerCache: skip Node::decode() on cache hits Mar 9, 2026
@github-actions github-actions bot added L1 Ethereum client performance Block execution throughput and performance in general labels Mar 9, 2026
@github-actions
Copy link

github-actions bot commented Mar 9, 2026

Benchmark Block Execution Results Comparison Against Main

Command Mean [s] Min [s] Max [s] Relative
base 61.412 ± 0.311 61.191 62.247 1.00 ± 0.01
head 61.147 ± 0.103 60.976 61.299 1.00

@Arkenan Arkenan force-pushed the decoded_trie_layer_cache branch from a03408a to fdb6071 Compare March 12, 2026 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

L1 Ethereum client performance Block execution throughput and performance in general

Projects

Status: No status
Status: Todo

Development

Successfully merging this pull request may close these issues.

1 participant