miner: pipelined state root computation (PoC)#2180
miner: pipelined state root computation (PoC)#2180pratikspatil024 wants to merge 6 commits intodelay_srcfrom
Conversation
There was a problem hiding this comment.
Claude Code Review
This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.
Tip: disable this comment in your organization's Code Review settings.
Code ReviewFound 6 issues: 4 bugs and 2 security concerns. Bugs
Security Concerns
|
Code ReviewFound 5 issues in miner/worker.go and miner/pipeline.go. Checked for bugs and CLAUDE.md compliance. 1. Bug: writeElapsed always ~0ns (miner/worker.go L1116-L1123) writeElapsed := time.Since(writeStart) is computed immediately after writeStart := time.Now(), before either WriteBlockAndSetHeadPipelined or WriteBlockAndSetHead executes. writeBlockAndSetHeadTimer always records ~0, and workerMgaspsTimer (line 1148) reports inflated MGas/s. Fix: move writeElapsed := time.Since(writeStart) to after the if/else block. 2. Bug: nil pointer dereference (miner/pipeline.go L379-L384) When chainHead is nil, the || short-circuits into the if-body, but chainHead.Number.Uint64() in log.Error dereferences nil and panics. Per CLAUDE.md: No panics in consensus, sync, or block production paths. Fix: split into two if-checks. 3. Bug: unchecked type assertion (miner/pipeline.go L335-L341) borEngine, _ := w.engine.(*bor.Bor) discards the ok boolean. If w.engine is not *bor.Bor, borEngine is nil and borEngine.AssembleBlock(...) panics. The same assertion at line 96 correctly checks ok. Fix: check ok and return early. 4. Bug: goroutine leak on 5 return paths (miner/pipeline.go L293-L345) initialFillDone channel (line 293) goroutine is not drained on return paths at lines 345, 357, 371, 373, 383. Only WaitForSRC error (line 331) and happy path (line 390) drain it. Fix: defer drain after line 293. 5. Bug: trie DB race after SpawnSRCGoroutine (miner/pipeline.go L206-L229) SpawnSRCGoroutine called at line 213 launches a goroutine doing CommitWithUpdate. If StateAtWithFlatDiff fails (line 219) or GetHeader returns nil (line 228), fallbackToSequential does IntermediateRoot inline on the same parent root concurrently. The comments at lines 206-211 identify this as causing missing trie node / layer stale errors but only guard the Prepare() case. Fix: WaitForSRC() before fallbackToSequential, or move spawn after preconditions. |
|
Codecov Report❌ Patch coverage is ❌ Your patch check has failed because the patch coverage (16.47%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## delay_src #2180 +/- ##
============================================
Coverage ? 51.56%
============================================
Files ? 886
Lines ? 156742
Branches ? 0
============================================
Hits ? 80826
Misses ? 70680
Partials ? 5236
🚀 New features to boost your workflow:
|
I am okay with the idea of removing the remaining 100ms. We already reduced this buffer from 500ms to 100ms in v2.7.1, and from what we have seen so far, this remaining time looks small enough that removing it seems reasonable. My main concern is not the removal of the 100ms itself. My concern is the cost of pipelining SRC with the next block production. In other words: by doing SRC in parallel with block building, how much do we impact SRC time itself? Do we expect SRC to remain roughly the same, or does it become meaningfully slower because it is now competing with the next block production? That is the part I would like to understand better. I think this is basically a TPS vs finality question:
So I am supportive of the direction, but I think the key question is still: How much TPS do we gain, and how much finality do we lose, if any, by making SRC fully pipelined with block production? If the impact on SRC time is only slight, then the tradeoff is probably clearly worth it. But if SRC time increases materially once it is pipelined with block production, then we should make that tradeoff explicit |
I think SRC will be roughly the same, because the time consuming part, trie nodes prefetching, is already running at the same time with tx execution today, and this PR doesn't change this behavior. |
| // state resets for pipelined SRC. This avoids import cycles between txpool | ||
| // and legacypool packages. | ||
| type SpeculativeResetter interface { | ||
| ResetSpeculativeState(newHead *types.Header, statedb *state.StateDB) |
There was a problem hiding this comment.
In terms of naming, I would simply name it as SpeculativeSetter and SetSpeculativeState. The reset seems redundant.
| // The state commit is handled separately by the SRC goroutine that already | ||
| // called CommitWithUpdate. This avoids the "layer stale" error that occurs | ||
| // when two CommitWithUpdate calls diverge from the same parent root. | ||
| func (bc *BlockChain) WriteBlockAndSetHeadPipelined(block *types.Block, receipts []*types.Receipt, logs []*types.Log, statedb *state.StateDB, emitHeadEvent bool, witnessBytes []byte) (WriteStatus, error) { |
There was a problem hiding this comment.
There are some shared code between this and WriteBlockAndSetHead. Could we refactor and dedupe the code?
| // This is used by the txpool and RPC layer to get correct state when the chain | ||
| // head was produced via the pipeline (where the committed trie root may lag | ||
| // behind the actual post-execution state). | ||
| func (bc *BlockChain) PostExecutionStateAt(header *types.Header) (*state.StateDB, error) { |
There was a problem hiding this comment.
nitpick PostExecutionStateAt -> PostExecState to make it simpler
| // speculatively using the FlatDiff overlay, then waits for SRC(N) to complete, | ||
| // assembles block N, and sends it for sealing. Then it finalizes N+1 and | ||
| // seals it as well. | ||
| func (w *worker) commitSpeculativeWork(req *speculativeWorkReq) { |
There was a problem hiding this comment.
This is a huge function with 500+ lines. Can we decompose it into smaller functions for maintainability?
| var coinbase common.Address | ||
| if w.chainConfig.Bor != nil && w.chainConfig.Bor.IsRio(new(big.Int).SetUint64(nextBlockNumber)) { | ||
| coinbase = common.HexToAddress(w.chainConfig.Bor.CalculateCoinbase(nextBlockNumber)) | ||
| } | ||
| if coinbase == (common.Address{}) { | ||
| coinbase = w.etherbase() | ||
| } | ||
|
|
||
| specHeader := &types.Header{ | ||
| ParentHash: placeholder, | ||
| Number: new(big.Int).SetUint64(nextBlockNumber), | ||
| GasLimit: core.CalcGasLimit(blockNHeader.GasLimit, w.config.GasCeil), | ||
| Time: blockNHeader.Time + w.chainConfig.Bor.CalculatePeriod(nextBlockNumber), | ||
| Coinbase: coinbase, | ||
| } | ||
| if w.chainConfig.IsLondon(specHeader.Number) { | ||
| specHeader.BaseFee = eip1559.CalcBaseFee(w.chainConfig, blockNHeader) | ||
| } | ||
|
|
||
| // Call Prepare() via the speculative chain reader with waitOnPrepare=false. | ||
| // This sets Difficulty, Extra (validator bytes at sprint boundary), and timestamp | ||
| // but does NOT sleep. The timing wait is deferred until after the abort check | ||
| // to avoid wasting a full block period if the speculative block is discarded. | ||
| // NOTE: Prepare() will zero out specHeader.Coinbase. The real coinbase | ||
| // is preserved in the local `coinbase` variable above. | ||
| if err := w.engine.Prepare(specReader, specHeader, false); err != nil { | ||
| log.Warn("Pipelined SRC: speculative Prepare failed, falling back", "err", err) | ||
| w.fallbackToSequential(req) | ||
| return | ||
| } |
There was a problem hiding this comment.
This duplicates a few things with makeHeader in worker.go. Maybe worth to unify.
| w.fallbackToSequential(req) | ||
| return | ||
| } | ||
| specState.StartPrefetcher("miner-speculative", nil, nil) |
There was a problem hiding this comment.
Regarding "layer stale" errors from prefetcher, I think we can delay the prefetching of N+1 until SRC for block N has completed. Asked claude about this idea and this is what it suggested:
The existing getStateObject/GetCommittedState code already calls prefetcher.prefetch() during execution, which queues tasks
and records what was accessed. The problem is that subfetcher.loop() immediately calls openTrie() and starts resolving —
hitting the stale layer. If we just delay the resolution, the queueing and dedup logic stays untouched.
The change:
1. trie_prefetcher.go (~30 lines) — add a gate channel to subfetcher:
type subfetcher struct {
// ... existing fields ...
gate chan struct{} // If non-nil, loop blocks until closed
}
func (sf *subfetcher) loop() {
defer close(sf.term)
// Wait for gate to open before touching the trie
if sf.gate != nil {
select {
case <-sf.gate:
case <-sf.stop:
return
}
}
if err := sf.openTrie(); err != nil {
return
}
// ... existing loop unchanged ...
}
Add Resume() to triePrefetcher:
func (p *triePrefetcher) Resume() {
p.lock.Lock()
defer p.lock.Unlock()
for _, f := range p.fetchers {
if f.gate != nil {
close(f.gate)
// Re-signal wake since signals were dropped while gated
select {
case f.wake <- struct{}{}:
default:
}
}
}
}
Wire the gate through: newSubfetcher accepts a gate channel, triePrefetcher stores a gated bool, and prefetch() passes the
gate when creating subfetchers.
2. statedb.go (~10 lines) — expose resume:
func (s *StateDB) ResumePrefetcher() {
if s.prefetcher != nil {
s.prefetcher.Resume()
}
}
3. pipeline.go (~5 lines) — start gated, resume after SRC:
// Before execution (line 225):
specState.StartPrefetcherGated("miner-speculative", nil, nil)
// After WaitForSRC returns (line 339):
specState.ResumePrefetcher()
The one tricky bit is the wake signal: schedule() has select { case sf.wake <- struct{}{}: default: } — if the loop isn't
listening (gated), the signal is dropped. The Resume() method handles this by re-signaling wake after opening the gate. Any
subfetcher with queued tasks will pick them up.
That's it. No changes to pathdb, no changes to the hot execution path (getStateObject/GetCommittedState), no changes to the
trie layer. The prefetcher's existing dedup tracking (seenReadAddr, seenReadSlot) means repeated accesses during execution
are collapsed — when the gate opens, only unique trie paths get resolved.
In the loop iterations (lines 620-652), the same pattern applies — the fill goroutine runs with a gated prefetcher, and
Resume() is called after the iteration's WaitForSRC returns.
| @@ -0,0 +1,933 @@ | |||
| package miner | |||
There was a problem hiding this comment.
Nice job on isolating the new logic in a new file!


Description
This is built on top of the delayed SRC PoC and takes the approach further: instead of just deferring SRC, it pipelines SRC with the next block's work.
How it works
After producing block N, the miner:
Config
--miner.pipelined-src— enable/disable (default: enabled)--miner.pipelined-src-logs— verbose pipeline logging (default: enabled)