refactor(trie): remove proof task manager #18934

yongkangc · 2025-10-10T10:21:50Z

Context:

As part of our performance work to reduce overhead and improve scheduling, we added worker pooling for multiproof generation.
This PR aims to perform cleanup and remove ProofTaskManager as an abstraction as we can now directly dispatch the proofs jobs to workers.

impact:

Change	Impact
Remove `run()` loop thread	-1 thread, -1 channel hop
Direct channel sends	~some time saved per task
Eliminate enum wrapping	~2 allocations saved per task

reference PRs:

- Removed the Factory type parameter from the ParallelProof struct, streamlining its definition and implementation. - Updated the constructor and related methods to reflect this change, enhancing code clarity and maintainability. - Eliminated unused PhantomData field, reducing complexity in the struct's design.

- Eliminated the Factory type definition from the proof tests, simplifying the code structure. - This change contributes to improved clarity and maintainability of the test implementation.

- Removed the generic Factory type from MultiProofConfig and related structs, streamlining their definitions and improving code clarity. - Updated methods to reflect the removal of the Factory type, enhancing maintainability. - Adjusted the implementation of PendingMultiproofTask and its associated methods to eliminate unnecessary type parameters, simplifying the codebase.

- Replaced the ProofTaskManager with a new spawn_proof_workers function for better clarity and maintainability. - Updated related code to utilize the new function, simplifying the worker spawning process. - Enhanced metrics tracking for storage and account proof requests, ensuring thread-safe operations. - Improved error handling and code structure across proof task implementations.

- Added a constant `MIN_WORKER_COUNT` to enforce a minimum number of workers for storage and account proof tasks. - Updated `default_storage_worker_count` and `default_account_worker_count` functions to utilize the new minimum constraint. - Enhanced setter methods in `TreeConfig` to ensure worker counts do not fall below the minimum. - Modified command-line argument parsing to validate worker counts against the minimum requirement.

- Added a debug assertion to ensure active_handles does not underflow when dropping a ProofTaskManagerHandle. - Implemented metrics recording to flush before exit when the last handle is dropped, enhancing monitoring capabilities.

Copilot

Pull Request Overview

This PR refactors the proof task management by removing the ProofTaskManager abstraction and replacing it with direct worker pool spawning. The change eliminates the routing thread overhead by providing direct channel access to storage and account worker pools, simplifying the architecture while maintaining the same worker pool functionality.

Key changes:

Replaced ProofTaskManager with spawn_proof_workers function for direct worker spawning
Converted ProofTaskManagerHandle to provide type-safe queue methods with direct channel access
Updated metrics to use lock-free atomic counters for thread-safe operations

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
crates/trie/parallel/src/proof_task_metrics.rs	Converts metrics fields to atomic counters for lock-free thread safety
crates/trie/parallel/src/proof_task.rs	Replaces ProofTaskManager with spawn_proof_workers function and updates handle interface
crates/trie/parallel/src/proof.rs	Updates proof generation to use new direct queue methods
crates/node/core/src/args/engine.rs	Adds minimum worker count validation to CLI arguments
crates/engine/tree/src/tree/payload_validator.rs	Updates error message to reflect new spawning approach
crates/engine/tree/src/tree/payload_processor/multiproof.rs	Updates multiproof manager to use new queue methods
crates/engine/tree/src/tree/payload_processor/mod.rs	Replaces ProofTaskManager instantiation with spawn_proof_workers
crates/engine/primitives/src/config.rs	Adds MIN_WORKER_COUNT constant and enforces minimum worker limits

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

crates/trie/parallel/src/proof_task.rs

Copilot

Pull Request Overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

crates/trie/parallel/src/proof_task.rs

- Introduced helper functions to streamline error conversion from ProviderError and channel receive errors to SparseTrieError. - Enhanced readability and maintainability of the trie_node method by reducing repetitive error handling code.

- Updated the error conversion helper function in ProofTaskTrieNodeProvider to directly wrap the ProviderError, enhancing clarity and maintainability. - This change simplifies the error handling logic within the trie_node method.

yongkangc · 2025-10-10T12:43:52Z

@shekhirin thanks for the review, just addressed all your comments in the commits

shekhirin · 2025-10-10T12:55:20Z

crates/trie/parallel/src/proof_task.rs

+        /// Helper to convert `ProviderError` to `SparseTrieError`
+        fn provider_err_to_trie_err(e: ProviderError) -> SparseTrieError {
+            SparseTrieErrorKind::Other(Box::new(e)).into()
+        }
+
+        /// Helper to convert channel recv error to `SparseTrieError`
+        fn recv_err_to_trie_err(_: std::sync::mpsc::RecvError) -> SparseTrieError {
+            SparseTrieErrorKind::Other(Box::new(std::io::Error::other("channel closed"))).into()
+        }


can we just do this without helper functions?

reth/crates/trie/trie/src/proof/trie_node.rs

Line 96 in f5840fc

.map_err(|error| SparseTrieErrorKind::Other(Box::new(error)))?;

RecvError already implements Error, so should work?

yup we could, initially wanted helper because it looked ugly with all the map_err

yup, initially wanted helper because it looked ugly with all the map_err

done in 2698cc8

shekhirin · 2025-10-10T12:59:20Z

crates/engine/primitives/src/config.rs

+/// Clamps the worker count to the minimum allowed value.
+///
+/// Ensures that the worker count is at least [`MIN_WORKER_COUNT`].
+const fn clamp_worker_count(count: usize) -> usize {


is this just .max(MIN_WORKER_COUNT)? Let's move this to with_*_worker_count functions, no need to have a separate helper fn for this

much more concise, done in c68eb7a

- Removed redundant helper functions for error conversion in ProofTaskTrieNodeProvider. - Simplified error handling by directly mapping errors to SparseTrieError, improving code clarity and maintainability.

- Removed the `clamp_worker_count` function and replaced its logic with direct usage of `max` in the setter methods for storage and account worker counts. - This change enhances code clarity and reduces unnecessary function overhead while ensuring the minimum worker count is enforced.

yongkangc · 2025-10-13T02:12:19Z

@mediocregopher @shekhirin bumping up for final review again :)

mediocregopher · 2025-10-13T08:45:06Z

crates/trie/parallel/src/proof_task_metrics.rs

-    /// Count of blinded storage node requests.
-    pub storage_nodes: usize,
+    /// Count of storage proof requests (lock-free).
+    pub storage_proofs: Arc<AtomicU64>,


The purpose of ProofTaskMetrics is to allow us to lazily record metrics at the end of block processing, so that during block processing we can just increment raw ints and not incur synchronization overhead. By using atomics here we are negating that.

If the issue is in sharing the ProofTaskMetrics across workers then we could clone the ProofTaskMetrics into each worker on startup, and have each worker record prior to termination.

good point on the synchronization overhead, addressed here bc4ecf5

mediocregopher · 2025-10-13T08:48:30Z

crates/trie/parallel/src/proof_task.rs

+/// - `task_ctx`: Shared context with trie updates and prefix sets
+/// - `storage_worker_count`: Number of storage workers to spawn
+/// - `account_worker_count`: Number of account workers to spawn
+pub fn spawn_proof_workers<Factory>(


nit: this would make more sense as ProofTaskManagerHandler::new I think.

thats much nicer api, addressed f9e167e

- Updated the code to utilize ProofTaskManagerHandle for spawning proof workers instead of the deprecated spawn_proof_workers function. - This change enhances code clarity and maintainability by consolidating the worker management logic within the ProofTaskManagerHandle struct.

- Removed lock-free atomic counters from ProofTaskMetrics and replaced them with direct method calls for recording blinded node counts. - Updated storage and account worker loops to utilize the new metrics recording methods, enhancing clarity and maintainability. - Simplified the ProofTaskManagerHandle by removing unnecessary metrics fields, streamlining the overall structure.

yongkangc

@mediocregopher just addressed your comments

Copilot

Pull Request Overview

Copilot reviewed 7 out of 7 changed files in this pull request and generated no new comments.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

mattsse

lgtm

left some suggestions, iirc additional changes to the worker pool are planned anyway and we could tackle those separately

mattsse · 2025-10-14T09:37:12Z

crates/trie/parallel/src/proof.rs

-        let _ = self.proof_task_handle.queue_task(ProofTaskKind::StorageProof(input, sender));
-        receiver
+        self.proof_task_handle
+            .queue_storage_proof(input)


can we rename these after we merge, because I find queue very confusing here because this only sends

thats true -> should rename it as send_task

mattsse · 2025-10-14T09:44:28Z

crates/trie/parallel/src/proof_task.rs

+        for worker_id in 0..storage_worker_count {
+            let provider_ro = view.provider_ro()?;


I believe this is still something we pay for upfront, meaning this isn't done in the background

this does mean it currently takes more time to set this up if we bump the worker count?

I think ideally we return the channels right away and do this setup in the background so that we don't block here:

reth/crates/engine/tree/src/tree/payload_processor/mod.rs

Line 206 in 1619408

let proof_handle = match ProofTaskManagerHandle::new(

this makes sense, good idea for us to do in bg

mattsse · 2025-10-14T09:44:55Z

crates/trie/parallel/src/proof_task.rs

+        for worker_id in 0..account_worker_count {
+            let provider_ro = view.provider_ro()?;


crates/engine/primitives/src/config.rs

yongkangc added 8 commits October 10, 2025 05:57

removing from test

d5fedd8

refactor: remove unused Factory type from proof tests

444b24c

- Eliminated the Factory type definition from the proof tests, simplifying the code structure. - This change contributes to improved clarity and maintainability of the test implementation.

clippy

47da382

fmt

0a88a18

refactor: yeet proof task manager

ed45ebd

github-project-automation bot added this to Reth Tracker Oct 10, 2025

github-project-automation bot moved this to Backlog in Reth Tracker Oct 10, 2025

yongkangc self-assigned this Oct 10, 2025

yongkangc moved this from Backlog to In Progress in Reth Tracker Oct 10, 2025

yongkangc added 3 commits October 10, 2025 10:25

fix comment

d44180d

yongkangc requested a review from Copilot October 10, 2025 10:36

yongkangc changed the title ~~refactor: remove proof task manager~~ refactor(trie): remove proof task manager Oct 10, 2025

Copilot AI reviewed Oct 10, 2025

View reviewed changes

crates/trie/parallel/src/proof_task.rs Outdated Show resolved Hide resolved

crates/trie/parallel/src/proof_task.rs Outdated Show resolved Hide resolved

yongkangc requested a review from Copilot October 10, 2025 10:38

yongkangc added 2 commits October 10, 2025 10:39

clippy

8e00a4a

fmt

2b90133

Copilot AI reviewed Oct 10, 2025

View reviewed changes

crates/trie/parallel/src/proof_task.rs Outdated Show resolved Hide resolved

crates/trie/parallel/src/proof_task.rs Outdated Show resolved Hide resolved

crates/trie/parallel/src/proof_task.rs Outdated Show resolved Hide resolved

yongkangc force-pushed the yk/pool_clean branch from 2c025cd to 2b90133 Compare October 10, 2025 10:40

yongkangc mentioned this pull request Oct 10, 2025

perf(tree): worker pooling for account proofs #18901

Open

yongkangc requested a review from Copilot October 10, 2025 10:51

yongkangc marked this pull request as ready for review October 10, 2025 10:51

yongkangc requested review from Rjected, rkrasiuk and shekhirin as code owners October 10, 2025 10:51

refactor: improve error handling in trie_node method

c02a68d

- Updated the error conversion helper function in ProofTaskTrieNodeProvider to directly wrap the ProviderError, enhancing clarity and maintainability. - This change simplifies the error handling logic within the trie_node method.

shekhirin reviewed Oct 10, 2025

View reviewed changes

yongkangc added 2 commits October 13, 2025 02:11

refactor: enhance error handling in trie_node method

2698cc8

- Removed redundant helper functions for error conversion in ProofTaskTrieNodeProvider. - Simplified error handling by directly mapping errors to SparseTrieError, improving code clarity and maintainability.

fix clippy

6a4766c

yongkangc requested a review from klkvr as a code owner October 13, 2025 05:07

Base automatically changed from refactor/remove-factory-generic to yk/worker_pool_acc October 13, 2025 05:13

mediocregopher requested changes Oct 13, 2025

View reviewed changes

yongkangc requested a review from shekhirin October 13, 2025 08:53

yongkangc added 3 commits October 13, 2025 09:19

Merge branch 'yk/worker_pool_acc' into yk/pool_clean

45765ac

yongkangc commented Oct 13, 2025

View reviewed changes

yongkangc requested a review from mediocregopher October 13, 2025 09:33

remove wrapper

d8c8559

yongkangc requested a review from Copilot October 13, 2025 09:58

Copilot AI reviewed Oct 13, 2025

View reviewed changes

yongkangc added 2 commits October 13, 2025 10:01

remove active handle

1e9874b

fmt

1619408

mediocregopher approved these changes Oct 13, 2025

View reviewed changes

mattsse approved these changes Oct 14, 2025

View reviewed changes

yongkangc commented Oct 14, 2025

View reviewed changes

crates/engine/primitives/src/config.rs Outdated Show resolved Hide resolved

yongkangc commented Oct 14, 2025

View reviewed changes

crates/engine/primitives/src/config.rs Outdated Show resolved Hide resolved

yongkangc added 2 commits October 14, 2025 18:15

Apply suggestion from @yongkangc

7041e88

Apply suggestion from @yongkangc

8f0aa64

yongkangc mentioned this pull request Oct 14, 2025

perf: spawning worker in bg instead #18995

Open

Rjected approved these changes Oct 14, 2025

View reviewed changes

		for worker_id in 0..storage_worker_count {
		let provider_ro = view.provider_ro()?;

		for worker_id in 0..account_worker_count {
		let provider_ro = view.provider_ro()?;

refactor(trie): remove proof task manager #18934

Are you sure you want to change the base?

refactor(trie): remove proof task manager #18934

Uh oh!

Conversation

yongkangc commented Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context:

impact:

reference PRs:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yongkangc commented Oct 10, 2025

Uh oh!

shekhirin Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yongkangc Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yongkangc commented Oct 13, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yongkangc left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

mattsse left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

yongkangc commented Oct 10, 2025 •

edited

Loading

shekhirin Oct 10, 2025 •

edited

Loading

yongkangc Oct 13, 2025 •

edited

Loading