miner: pipelined state root computation (PoC) by pratikspatil024 · Pull Request #2180 · 0xPolygon/bor

pratikspatil024 · 2026-04-01T15:57:10Z

Description

Overlap state root computation (SRC) of block N with transaction execution of block N+1.
Additionally, the 500ms buffer previously reserved after transaction execution for SRC is removed when the pipeline is active. Transactions now get the full block time for inclusion since SRC runs in the background.
The chain DB write is also moved off the critical path by writing asynchronously after broadcast, with witnesses cached in memory so stateless peers can fetch them immediately.

This is built on top of the delayed SRC PoC and takes the approach further: instead of just deferring SRC, it pipelines SRC with the next block's work.

How it works

After producing block N, the miner:

Extracts a FlatDiff (in-memory snapshot of state mutations) instead of computing the state root inline
Spawns an SRC goroutine that computes the root in the background
Opens a speculative state for block N+1 using the FlatDiff overlay
Fills transactions for N+1 in a goroutine (concurrently with SRC)
Collects the SRC result as soon as it's ready, seals and broadcasts block N
Writes block N to the chain DB asynchronously
Repeats in a continuous loop

Config

--miner.pipelined-src — enable/disable (default: enabled)
--miner.pipelined-src-logs — verbose pipeline logging (default: enabled)

…oved the post tx execution buffer time

claude

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

_{Tip: disable this comment in your organization's Code Review settings.}

claude · 2026-04-01T16:18:03Z

Code Review

Found 6 issues: 4 bugs and 2 security concerns.

Bugs

miner/worker.go:1117 — writeElapsed always measures ~0 (broken metric)
writeElapsed is computed immediately after writeStart, before either write call executes. The original code had the write call between writeStart and writeElapsed. The writeBlockAndSetHeadTimer metric will always report approximately zero. Fix: move writeElapsed := time.Since(writeStart) to after the if/else block.

bor/miner/worker.go

Lines 1116 to 1123 in 07345ad

    
           writeStart := time.Now() 
        
           writeElapsed := time.Since(writeStart) 
        
           if task.pipelined { 
        
           	_, err = w.chain.WriteBlockAndSetHeadPipelined(block, receipts, logs, task.state, true, task.witnessBytes) 
        
           } else { 
        
           	_, err = w.chain.WriteBlockAndSetHead(block, receipts, logs, task.state, true) 
        
           } 
        
           writeBlockAndSetHeadTimer.Update(writeElapsed)

miner/pipeline.go:380-383 — nil pointer dereference when chainHead == nil
When chainHead is nil, the || short-circuits to true and enters the if-body, where chainHead.Number.Uint64() panics. This is in the block production path. Per security-common.md: No panics in consensus, sync, or block production paths. Fix: split the nil check from the number check into separate if-blocks.

bor/miner/pipeline.go

Lines 379 to 384 in 07345ad

    
           chainHead := w.chain.CurrentBlock() 
        
           if chainHead == nil || chainHead.Number.Uint64() != blockNNum { 
        
           	log.Error("Pipelined SRC: chain head mismatch after waiting", "expected", blockNNum, 
        
           		"got", chainHead.Number.Uint64()) 
        
           	return 
        
           }

core/stateless/witness.go:101 — NewWitness no longer copies the context header (mutation risk)
The old code did ctx := types.CopyHeader(context) and zeroed Root/ReceiptHash. The new code stores the caller pointer directly. In miner/worker.go:1196, the raw header pointer is passed — this header is later mutated in place. The Witness will silently see those mutations. See state-security.md threat model.

bor/core/stateless/witness.go

Lines 88 to 107 in 07345ad

    
           func NewWitness(context *types.Header, chain HeaderReader) (*Witness, error) { 
        
           	// When building witnesses, retrieve the parent header, which will *always* 
        
           	// be included to act as a trustless pre-root hash container 
        
           	var headers []*types.Header 
        
           	if chain != nil { 
        
           		parent := chain.GetHeader(context.ParentHash, context.Number.Uint64()-1) 
        
           		if parent == nil { 
        
           			return nil, errors.New("failed to retrieve parent header") 
        
           		} 
        
           		headers = append(headers, parent) 
        
           	} 
        
           	// Create the witness with a reconstructed gutted out block 
        
           	return &Witness{ 
        
           		context: context, 
        
           		Headers: headers, 
        
           		Codes:   make(map[string]struct{}), 
        
           		State:   make(map[string]struct{}), 
        
           		chain:   chain, 
        
           	}, nil 
        
           }

miner/pipeline.go:124 — SetLastFlatDiff stores a provisional header hash that never matches
env.header.Hash() lacks both Root and the seal signature. In PostExecutionStateAt, the comparison uses the sealed header — so FlatDiff overlay path is never taken. The txpool falls back to StateAt(header.Root) which may fail if SRC hasn't committed. Same issue at lines 521 and 783.

bor/miner/pipeline.go

Lines 123 to 125 in 07345ad

    
           w.chain.SetLastFlatDiff(flatDiff, env.header.Hash()) 
        
           // Note: this counts block N as "entering the pipeline." If Prepare() fails

Security Concerns

core/stateless/witness.go:56 — pre-state root validation anchored to untrusted witness data
The old ValidateWitnessPreState took a caller-supplied expectedPreStateRoot. The new version fetches the parent using witness.context.ParentHash (from the witness itself). For peer-received witnesses, no call site verifies witness.context.ParentHash == block.ParentHash(). A malicious peer could bypass the pre-state root check. Per state-security.md and security-common.md peer-triggerable escalation.

bor/core/stateless/witness.go

Lines 49 to 71 in 07345ad

    
           // Get the witness context header (the block this witness is for). 
        
           contextHeader := witness.Header() 
        
           if contextHeader == nil { 
        
           	return fmt.Errorf("witness context header is nil") 
        
           } 
        
           // Get the parent block header from the chain. 
        
           parentHeader := headerReader.GetHeader(contextHeader.ParentHash, contextHeader.Number.Uint64()-1) 
        
           if parentHeader == nil { 
        
           	return fmt.Errorf("parent block header not found: parentHash=%x, parentNumber=%d", 
        
           		contextHeader.ParentHash, contextHeader.Number.Uint64()-1) 
        
           } 
        
           // Get witness pre-state root (from first header which should be parent). 
        
           witnessPreStateRoot := witness.Root() 
        
           // Compare with actual parent block's state root. 
        
           if witnessPreStateRoot != parentHeader.Root { 
        
           	return fmt.Errorf("witness pre-state root mismatch: witness=%x, parent=%x, blockNumber=%d", 
        
           		witnessPreStateRoot, parentHeader.Root, contextHeader.Number.Uint64()) 
        
           } 
        
           return nil

core/blockchain.go:4402 — SpawnSRCGoroutine uses raw go func() without panic recovery
The old code used bc.wg.Go(func() { ... }) for lifecycle-safe goroutine management. The new code uses bc.wg.Add(1) + raw go func(). If the goroutine panics, the process crashes without graceful shutdown. Per security-common.md: No panics in block production paths.

bor/core/blockchain.go

Lines 4399 to 4410 in 07345ad

    
           pending.wg.Add(1) 
        
           bc.wg.Add(1) 
        
           go func() { 
        
           	defer bc.wg.Done() 
        
           	defer pending.wg.Done() 
        
           	tmpDB, err := state.New(parentRoot, bc.statedb) 
        
           	if err != nil { 
        
           		log.Error("Pipelined SRC: failed to open tmpDB", "parentRoot", parentRoot, "err", err) 
        
           		pending.err = err 
        
           		return

claude · 2026-04-01T16:31:28Z

Code Review

Found 5 issues in miner/worker.go and miner/pipeline.go. Checked for bugs and CLAUDE.md compliance.

1. Bug: writeElapsed always ~0ns (miner/worker.go L1116-L1123)

writeElapsed := time.Since(writeStart) is computed immediately after writeStart := time.Now(), before either WriteBlockAndSetHeadPipelined or WriteBlockAndSetHead executes. writeBlockAndSetHeadTimer always records ~0, and workerMgaspsTimer (line 1148) reports inflated MGas/s. Fix: move writeElapsed := time.Since(writeStart) to after the if/else block.

2. Bug: nil pointer dereference (miner/pipeline.go L379-L384)

When chainHead is nil, the || short-circuits into the if-body, but chainHead.Number.Uint64() in log.Error dereferences nil and panics. Per CLAUDE.md: No panics in consensus, sync, or block production paths. Fix: split into two if-checks.

3. Bug: unchecked type assertion (miner/pipeline.go L335-L341)

borEngine, _ := w.engine.(*bor.Bor) discards the ok boolean. If w.engine is not *bor.Bor, borEngine is nil and borEngine.AssembleBlock(...) panics. The same assertion at line 96 correctly checks ok. Fix: check ok and return early.

4. Bug: goroutine leak on 5 return paths (miner/pipeline.go L293-L345)

initialFillDone channel (line 293) goroutine is not drained on return paths at lines 345, 357, 371, 373, 383. Only WaitForSRC error (line 331) and happy path (line 390) drain it. Fix: defer drain after line 293.

5. Bug: trie DB race after SpawnSRCGoroutine (miner/pipeline.go L206-L229)

SpawnSRCGoroutine called at line 213 launches a goroutine doing CommitWithUpdate. If StateAtWithFlatDiff fails (line 219) or GetHeader returns nil (line 228), fallbackToSequential does IntermediateRoot inline on the same parent root concurrently. The comments at lines 206-211 identify this as causing missing trie node / layer stale errors but only guard the Prepare() case. Fix: WaitForSRC() before fallbackToSequential, or move spawn after preconditions.

…ss manager hash matching

sonarqubecloud · 2026-04-02T07:11:56Z

Quality Gate failed

Failed conditions
5.2% Duplication on New Code (required ≤ 3%)

See analysis details on SonarQube Cloud

codecov · 2026-04-02T07:37:41Z

Codecov Report

❌ Patch coverage is 16.47910% with 1039 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (delay_src@b232538). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
miner/pipeline.go	0.58%	675 Missing ⚠️
core/blockchain.go	9.13%	207 Missing and 2 partials ⚠️
consensus/bor/bor.go	20.63%	50 Missing ⚠️
core/txpool/legacypool/legacypool.go	0.00%	21 Missing ⚠️
tests/bor/helper.go	74.57%	10 Missing and 5 partials ⚠️
miner/worker.go	58.06%	8 Missing and 5 partials ⚠️
miner/speculative_chain_reader.go	70.73%	12 Missing ⚠️
core/txpool/txpool.go	0.00%	11 Missing ⚠️
core/stateless/witness.go	57.14%	8 Missing and 1 partial ⚠️
eth/api_backend.go	20.00%	8 Missing ⚠️
... and 4 more

❌ Your patch check has failed because the patch coverage (16.47%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

@@             Coverage Diff              @@
##             delay_src    #2180   +/-   ##
============================================
  Coverage             ?   51.56%           
============================================
  Files                ?      886           
  Lines                ?   156742           
  Branches             ?        0           
============================================
  Hits                 ?    80826           
  Misses               ?    70680           
  Partials             ?     5236

Files with missing lines	Coverage Δ
core/block_validator.go	`48.18% <ø> (ø)`
core/rawdb/accessors_state.go	`14.28% <ø> (ø)`
core/rawdb/schema.go	`37.28% <ø> (ø)`
eth/handler.go	`63.58% <ø> (ø)`
internal/cli/server/config.go	`63.96% <100.00%> (ø)`
internal/cli/server/flags.go	`100.00% <100.00%> (ø)`
miner/fake_miner.go	`89.50% <100.00%> (ø)`
miner/miner.go	`72.35% <ø> (ø)`
params/config.go	`36.81% <ø> (ø)`
core/blockchain_reader.go	`42.06% <0.00%> (ø)`
... and 13 more

Files with missing lines	Coverage Δ
core/block_validator.go	`48.18% <ø> (ø)`
core/rawdb/accessors_state.go	`14.28% <ø> (ø)`
core/rawdb/schema.go	`37.28% <ø> (ø)`
eth/handler.go	`63.58% <ø> (ø)`
internal/cli/server/config.go	`63.96% <100.00%> (ø)`
internal/cli/server/flags.go	`100.00% <100.00%> (ø)`
miner/fake_miner.go	`89.50% <100.00%> (ø)`
miner/miner.go	`72.35% <ø> (ø)`
params/config.go	`36.81% <ø> (ø)`
core/blockchain_reader.go	`42.06% <0.00%> (ø)`
... and 13 more

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

lucca30 · 2026-04-06T22:10:13Z

Additionally, the 500ms buffer previously reserved after transaction execution for SRC is removed when the pipeline is active. Transactions now get the full block time for inclusion since SRC runs in the background.

I am okay with the idea of removing the remaining 100ms.

We already reduced this buffer from 500ms to 100ms in v2.7.1, and from what we have seen so far, this remaining time looks small enough that removing it seems reasonable.

My main concern is not the removal of the 100ms itself. My concern is the cost of pipelining SRC with the next block production.

In other words: by doing SRC in parallel with block building, how much do we impact SRC time itself?

Do we expect SRC to remain roughly the same, or does it become meaningfully slower because it is now competing with the next block production? That is the part I would like to understand better.

I think this is basically a TPS vs finality question:

on one side, we gain more block time for transaction inclusion, which is good for TPS
on the other side, if SRC takes longer to complete, we may delay block completion, which could hurt finality

So I am supportive of the direction, but I think the key question is still:

How much TPS do we gain, and how much finality do we lose, if any, by making SRC fully pipelined with block production?

If the impact on SRC time is only slight, then the tradeoff is probably clearly worth it.

But if SRC time increases materially once it is pipelined with block production, then we should make that tradeoff explicit

cffls · 2026-04-07T20:57:51Z

Do we expect SRC to remain roughly the same, or does it become meaningfully slower because it is now competing with the next block production? That is the part I would like to understand better.

I think SRC will be roughly the same, because the time consuming part, trie nodes prefetching, is already running at the same time with tx execution today, and this PR doesn't change this behavior.

cffls · 2026-04-07T22:52:43Z

core/txpool/txpool.go

+// state resets for pipelined SRC. This avoids import cycles between txpool
+// and legacypool packages.
+type SpeculativeResetter interface {
+	ResetSpeculativeState(newHead *types.Header, statedb *state.StateDB)


In terms of naming, I would simply name it as SpeculativeSetter and SetSpeculativeState. The reset seems redundant.

cffls · 2026-04-07T23:05:14Z

core/blockchain.go

+// The state commit is handled separately by the SRC goroutine that already
+// called CommitWithUpdate. This avoids the "layer stale" error that occurs
+// when two CommitWithUpdate calls diverge from the same parent root.
+func (bc *BlockChain) WriteBlockAndSetHeadPipelined(block *types.Block, receipts []*types.Receipt, logs []*types.Log, statedb *state.StateDB, emitHeadEvent bool, witnessBytes []byte) (WriteStatus, error) {


There are some shared code between this and WriteBlockAndSetHead. Could we refactor and dedupe the code?

cffls · 2026-04-07T23:06:48Z

core/blockchain.go

+// This is used by the txpool and RPC layer to get correct state when the chain
+// head was produced via the pipeline (where the committed trie root may lag
+// behind the actual post-execution state).
+func (bc *BlockChain) PostExecutionStateAt(header *types.Header) (*state.StateDB, error) {


nitpick PostExecutionStateAt -> PostExecState to make it simpler

cffls · 2026-04-08T01:27:24Z

miner/pipeline.go

+// speculatively using the FlatDiff overlay, then waits for SRC(N) to complete,
+// assembles block N, and sends it for sealing. Then it finalizes N+1 and
+// seals it as well.
+func (w *worker) commitSpeculativeWork(req *speculativeWorkReq) {


This is a huge function with 500+ lines. Can we decompose it into smaller functions for maintainability?

cffls · 2026-04-08T01:50:52Z

miner/pipeline.go

+	var coinbase common.Address
+	if w.chainConfig.Bor != nil && w.chainConfig.Bor.IsRio(new(big.Int).SetUint64(nextBlockNumber)) {
+		coinbase = common.HexToAddress(w.chainConfig.Bor.CalculateCoinbase(nextBlockNumber))
+	}
+	if coinbase == (common.Address{}) {
+		coinbase = w.etherbase()
+	}
+
+	specHeader := &types.Header{
+		ParentHash: placeholder,
+		Number:     new(big.Int).SetUint64(nextBlockNumber),
+		GasLimit:   core.CalcGasLimit(blockNHeader.GasLimit, w.config.GasCeil),
+		Time:       blockNHeader.Time + w.chainConfig.Bor.CalculatePeriod(nextBlockNumber),
+		Coinbase:   coinbase,
+	}
+	if w.chainConfig.IsLondon(specHeader.Number) {
+		specHeader.BaseFee = eip1559.CalcBaseFee(w.chainConfig, blockNHeader)
+	}
+
+	// Call Prepare() via the speculative chain reader with waitOnPrepare=false.
+	// This sets Difficulty, Extra (validator bytes at sprint boundary), and timestamp
+	// but does NOT sleep. The timing wait is deferred until after the abort check
+	// to avoid wasting a full block period if the speculative block is discarded.
+	// NOTE: Prepare() will zero out specHeader.Coinbase. The real coinbase
+	// is preserved in the local `coinbase` variable above.
+	if err := w.engine.Prepare(specReader, specHeader, false); err != nil {
+		log.Warn("Pipelined SRC: speculative Prepare failed, falling back", "err", err)
+		w.fallbackToSequential(req)
+		return
+	}


This duplicates a few things with makeHeader in worker.go. Maybe worth to unify.

cffls · 2026-04-08T02:35:37Z

miner/pipeline.go

+		w.fallbackToSequential(req)
+		return
+	}
+	specState.StartPrefetcher("miner-speculative", nil, nil)


Regarding "layer stale" errors from prefetcher, I think we can delay the prefetching of N+1 until SRC for block N has completed. Asked claude about this idea and this is what it suggested:

The existing getStateObject/GetCommittedState code already calls prefetcher.prefetch() during execution, which queues tasks and records what was accessed. The problem is that subfetcher.loop() immediately calls openTrie() and starts resolving — hitting the stale layer. If we just delay the resolution, the queueing and dedup logic stays untouched. The change: 1. trie_prefetcher.go (~30 lines) — add a gate channel to subfetcher: type subfetcher struct { // ... existing fields ... gate chan struct{} // If non-nil, loop blocks until closed } func (sf *subfetcher) loop() { defer close(sf.term) // Wait for gate to open before touching the trie if sf.gate != nil { select { case <-sf.gate: case <-sf.stop: return } } if err := sf.openTrie(); err != nil { return } // ... existing loop unchanged ... } Add Resume() to triePrefetcher: func (p *triePrefetcher) Resume() { p.lock.Lock() defer p.lock.Unlock() for _, f := range p.fetchers { if f.gate != nil { close(f.gate) // Re-signal wake since signals were dropped while gated select { case f.wake <- struct{}{}: default: } } } } Wire the gate through: newSubfetcher accepts a gate channel, triePrefetcher stores a gated bool, and prefetch() passes the gate when creating subfetchers. 2. statedb.go (~10 lines) — expose resume: func (s *StateDB) ResumePrefetcher() { if s.prefetcher != nil { s.prefetcher.Resume() } } 3. pipeline.go (~5 lines) — start gated, resume after SRC: // Before execution (line 225): specState.StartPrefetcherGated("miner-speculative", nil, nil) // After WaitForSRC returns (line 339): specState.ResumePrefetcher() The one tricky bit is the wake signal: schedule() has select { case sf.wake <- struct{}{}: default: } — if the loop isn't listening (gated), the signal is dropped. The Resume() method handles this by re-signaling wake after opening the gate. Any subfetcher with queued tasks will pick them up. That's it. No changes to pathdb, no changes to the hot execution path (getStateObject/GetCommittedState), no changes to the trie layer. The prefetcher's existing dedup tracking (seenReadAddr, seenReadSlot) means repeated accesses during execution are collapsed — when the gate opens, only unique trie paths get resolved. In the loop iterations (lines 620-652), the same pattern applies — the fill goroutine runs with a gated prefetcher, and Resume() is called after the iteration's WaitForSRC returns.

cffls · 2026-04-08T02:37:35Z

miner/pipeline.go

@@ -0,0 +1,933 @@
+package miner


Nice job on isolating the new logic in a new file!

pratikspatil024 added 3 commits March 30, 2026 17:10

miner, core, consensus/bor: pipelined state root computation (PoC)

d63daed

miner: run speculative fillTransactions concurrently with SRC and rem…

ceca519

…oved the post tx execution buffer time

miner: async DB write, concurrent fill, and interrupt timer improvements

3dcdf80

claude bot reviewed Apr 1, 2026

View reviewed changes

pratikspatil024 requested review from a team and cffls April 1, 2026 15:57

llint fix

07345ad

pratikspatil024 added 2 commits April 2, 2026 10:32

addressed comments and fix test, lint

a86db50

core/stateless: (fix unit test) fix NewWitness zeroing breaking witne…

8d5ed1b

…ss manager hash matching

cffls reviewed Apr 8, 2026

View reviewed changes

Conversation

pratikspatil024 commented Apr 1, 2026

Description

How it works

Config

Uh oh!

claude bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

claude bot commented Apr 1, 2026

Code Review

Bugs

Security Concerns

Uh oh!

claude bot commented Apr 1, 2026

Code Review

Uh oh!

sonarqubecloud bot commented Apr 2, 2026

Quality Gate failed

Uh oh!

codecov bot commented Apr 2, 2026

Codecov Report

Uh oh!

lucca30 commented Apr 6, 2026

Uh oh!

cffls commented Apr 7, 2026

Uh oh!

cffls Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

cffls Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

cffls Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

cffls Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

cffls Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

cffls Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

cffls Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants