feat: implement hash join spill for memory pressure handling by aunjgr · Pull Request #23795 · matrixorigin/matrixone

aunjgr · 2026-03-03T11:17:35Z

What type of PR is this?

Which issue(s) this PR fixes:

What this PR does / why we need it:

Implement spill-to-disk functionality for hash join operations to handle memory pressure by partitioning data into 32 buckets and writing to disk when memory threshold is exceeded.

Key changes:

Add SpillThreshold configuration to HashBuild and HashJoin operators
Implement xxhash-based partitioning for consistent bucket distribution
Add buffering (8192 rows) before flushing to disk to reduce I/O overhead
Reuse existing probe logic for processing spilled buckets
Track spill statistics (SpillSize, SpillRows) in operator analyzer
Properly clean up spill files and batches to prevent memory leaks

Implementation details:

HashBuild spills all batches when threshold exceeded, sends empty JoinMap with spill metadata to HashJoin
HashJoin creates matching probe spill files and partitions probe batches
After normal join completes, processes spilled buckets one at a time by rebuilding hashmap and probing in memory
Uses consistent xxhash computation across build and probe sides
Spill files use format: [count][size][batch_data][magic] for validation

Performance optimizations:

Buffer 8192 rows per bucket before flushing to reduce syscalls
Reuse probe logic to avoid code duplication
Clean up batches immediately after processing to minimize memory usage

This enables hash joins to handle datasets larger than available memory without OOM errors, with graceful degradation to disk-based processing.

Implement spill-to-disk functionality for hash join operations to handle memory pressure by partitioning data into 32 buckets and writing to disk when memory threshold is exceeded. Key changes: - Add SpillThreshold configuration to HashBuild and HashJoin operators - Implement xxhash-based partitioning for consistent bucket distribution - Add buffering (8192 rows) before flushing to disk to reduce I/O overhead - Reuse existing probe logic for processing spilled buckets - Track spill statistics (SpillSize, SpillRows) in operator analyzer - Properly clean up spill files and batches to prevent memory leaks Implementation details: - HashBuild spills all batches when threshold exceeded, sends empty JoinMap with spill metadata to HashJoin - HashJoin creates matching probe spill files and partitions probe batches - After normal join completes, processes spilled buckets one at a time by rebuilding hashmap and probing in memory - Uses consistent xxhash computation across build and probe sides - Spill files use format: [count][size][batch_data][magic] for validation Performance optimizations: - Buffer 8192 rows per bucket before flushing to reduce syscalls - Reuse probe logic to avoid code duplication - Clean up batches immediately after processing to minimize memory usage This enables hash joins to handle datasets larger than available memory without OOM errors, with graceful degradation to disk-based processing.

qodo-code-review · 2026-03-03T11:18:02Z

Review Summary by Qodo

Implement hash join spill-to-disk for memory pressure handling

✨ Enhancement

Walkthroughs

Description

• Implement spill-to-disk for hash join operations handling memory pressure
• Partition data into 32 buckets using xxhash with 8192-row buffering
• Process spilled buckets by rebuilding hashmap and probing in memory
• Track spill statistics (SpillSize, SpillRows) in operator analyzer
• Add join_spill_mem system variable for configurable memory threshold

Diagram

flowchart LR
  A["Build Phase"] -->|"Memory Threshold Exceeded"| B["Create 32 Spill Buckets"]
  B -->|"Partition by xxhash"| C["Write to Disk with Buffering"]
  C -->|"Send Empty JoinMap"| D["HashJoin Receives Spill Info"]
  D -->|"Partition Probe Batches"| E["Create Probe Spill Files"]
  E -->|"Process Each Bucket"| F["Rebuild Hashmap + Probe"]
  F -->|"Output Results"| G["Final Join Output"]

File Changes

1. pkg/sql/colexec/hashbuild/spill.go ✨ Enhancement +320/-0

Implement build-side spill-to-disk functionality

pkg/sql/colexec/hashbuild/spill.go

2. pkg/sql/colexec/hashjoin/spill.go ✨ Enhancement +483/-0

Implement probe-side spill and bucket processing

pkg/sql/colexec/hashjoin/spill.go

3. pkg/sql/colexec/hashbuild/hashmap.go ✨ Enhancement +202/-1

Add incremental hashmap building and finalization methods

pkg/sql/colexec/hashbuild/hashmap.go

View more (20)

4. pkg/sql/colexec/hashbuild/build.go ✨ Enhancement +96/-12

Integrate spill logic into build phase with threshold detection

pkg/sql/colexec/hashbuild/build.go

5. pkg/sql/colexec/hashbuild/types.go ✨ Enhancement +21/-2

Add spill state tracking and SpillThreshold configuration

pkg/sql/colexec/hashbuild/types.go

6. pkg/sql/colexec/hashjoin/types.go ✨ Enhancement +52/-0

Add spill bucket tracking and ProcessSpilled state

pkg/sql/colexec/hashjoin/types.go

7. pkg/sql/colexec/hashjoin/join.go ✨ Enhancement +65/-1

Handle spilled join processing and probe-side spilling

pkg/sql/colexec/hashjoin/join.go

8. pkg/vm/message/joinMapMsg.go ✨ Enhancement +35/-7

Add spill metadata to JoinMap message structure

pkg/vm/message/joinMapMsg.go

9. pkg/vm/process/operator_analyzer.go ✨ Enhancement +22/-8

Add SpillRows tracking and fix negative duration handling

pkg/vm/process/operator_analyzer.go

10. pkg/sql/compile/operator.go ✨ Enhancement +20/-1

Set SpillThreshold for hash join and build operators

pkg/sql/compile/operator.go

11. pkg/sql/plan/query_builder.go ✨ Enhancement +12/-0

Add join_spill_mem variable resolution and propagation

pkg/sql/plan/query_builder.go

12. pkg/sql/plan/types.go ✨ Enhancement +3/-0

Add joinSpillMem field to QueryBuilder

pkg/sql/plan/types.go

13. pkg/frontend/variables.go ⚙️ Configuration changes +8/-0

Define join_spill_mem system variable configuration

pkg/frontend/variables.go

14. pkg/sql/plan/join_order.go ✨ Enhancement +4/-0

Propagate SpillMem to join nodes in query planning

pkg/sql/plan/join_order.go

15. pkg/sql/plan/flatten_subquery.go ✨ Enhancement +2/-0

Set SpillMem for join nodes in subquery flattening

pkg/sql/plan/flatten_subquery.go

16. pkg/sql/compile/compile.go ✨ Enhancement +2/-2

Pass plan node to constructShuffleHashBuild for spill config

pkg/sql/compile/compile.go

17. pkg/sql/compile/analyze_module.go ✨ Enhancement +1/-0

Track SpillRows in node analysis information

pkg/sql/compile/analyze_module.go

18. pkg/sql/plan/explain/explain_node.go 📝 Documentation +2/-1

Display SpillRows in query explain output

pkg/sql/plan/explain/explain_node.go

19. pkg/sql/colexec/hashbuild/hashmap_test.go Miscellaneous +1/-16

Update package name from hashmap_util to hashbuild

pkg/sql/colexec/hashbuild/hashmap_test.go

20. proto/plan.proto ⚙️ Configuration changes +8/-7

Add spill_rows field to AnalyzeInfo message

proto/plan.proto

21. go.mod Dependencies +2/-2

Update bytedance/sonic to v1.15.0

go.mod

22. go.sum Dependencies +4/-4

Update bytedance/sonic and loader checksums

go.sum

23. pkg/pb/plan/plan.pb.go Additional files +637/-600

...

pkg/pb/plan/plan.pb.go

qodo-code-review · 2026-03-03T11:18:03Z

Code Review by Qodo

🐞 Bugs (8) 📘 Rule violations (0) 📎 Requirement gaps (2)

1. shouldSpillBatches blocks broadcast spill ☑ 📎 Requirement gap ⛯ Reliability

Description

The new spill decision explicitly returns false when hashBuild.IsShuffle is false, so
non-shuffle/broadcast hash joins can exceed the memory budget without spilling. This can lead to
unbounded memory growth and potential OOMs instead of spilling-to-disk.

Code

pkg/sql/colexec/hashbuild/spill.go[R186-189]

+func (hashBuild *HashBuild) shouldSpillBatches() bool {
+	if !hashBuild.IsShuffle || hashBuild.SpillThreshold <= 0 {
+		return false
+	}

Evidence
PR Compliance ID 1 requires hash join to enforce a memory budget by spilling when usage exceeds the
budget. The added shouldSpillBatches() logic disables spilling unless IsShuffle is enabled,
while the compiler constructs broadcast hash-build operators with IsShuffle = false, meaning those
hash joins will never spill even under memory pressure.
Hash join and hash aggregation track memory usage and spill when exceeding budget
pkg/sql/colexec/hashbuild/spill.go[186-189]
pkg/sql/compile/operator.go[1549-1552]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Spilling is currently disabled unless `hashBuild.IsShuffle` is true, which prevents broadcast/non-shuffle hash joins from spilling when they exceed the memory budget.
## Issue Context
Compliance requires hash join to enforce a memory budget by spilling-to-disk when memory usage exceeds the configured/allowed budget. The compiler creates broadcast hash-build operators with `IsShuffle = false`, so the current gating means those operators will not spill.
## Fix Focus Areas
- pkg/sql/colexec/hashbuild/spill.go[186-195]
- pkg/sql/compile/operator.go[1549-1560]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

2. Spill drops remaining rows 🐞 Bug ✓ Correctness

Description

processSpilledJoin clears ctr.leftBat after a single ctr.probe call, even though ctr.probe can
return early when the result batch fills; this loses the remaining portion of the current probe
batch and produces incorrect join output.

Code

pkg/sql/colexec/hashjoin/spill.go[R351-394]

+	// Set up container state for regular probe
+	ctr.leftBat = ctr.bucketProbeBatches[ctr.bucketProbeIdx]
+	ctr.bucketProbeIdx++
+	ctr.mp = ctr.bucketJoinMap
+	ctr.rightBats = ctr.mp.GetBatches()
+	ctr.rightRowCnt = ctr.mp.GetRowCount()
+	if hashJoin.IsRightJoin {
+		if ctr.rightRowCnt > 0 {
+			ctr.rightRowsMatched = &bitmap.Bitmap{}
+			ctr.rightRowsMatched.InitWithSize(ctr.rightRowCnt)
+		}
+	}
+
+	// Reset iterator for new probe batch
+	ctr.itr = nil
+
+	// Prepare fresh result batch
+	ctr.resBat = batch.NewWithSize(len(hashJoin.ResultCols))
+	for i, rp := range hashJoin.ResultCols {
+		if rp.Rel == 0 {
+			ctr.resBat.Vecs[i] = vector.NewVec(hashJoin.LeftTypes[rp.Pos])
+		} else {
+			ctr.resBat.Vecs[i] = vector.NewVec(hashJoin.RightTypes[rp.Pos])
+		}
+	}
+
+	// Reset probe state
+	ctr.lastIdx = 0
+	ctr.probeState = psNextBatch
+
+	// Use regular probe logic
+	var result vm.CallResult
+	if err := ctr.probe(hashJoin, proc, &result); err != nil {
+		return nil, err
+	}
+
+	// Clean up for next call
+	ctr.leftBat = nil
+
+	if result.Batch == nil || result.Batch.RowCount() == 0 {
+		return hashJoin.processSpilledJoin(proc, analyzer)
+	}
+
+	return result.Batch, nil

Evidence

In the normal probe loop, ctr.probe may return after filling one output batch and expects
ctr.leftBat/ctr.lastIdx to remain set so the next Call() continues from the same input batch. In
spill mode, processSpilledJoin always sets ctr.leftBat=nil after ctr.probe returns, discarding
continuation state and skipping remaining rows.

pkg/sql/colexec/hashjoin/spill.go[351-395]
pkg/sql/colexec/hashjoin/join.go[360-399]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
In spill-mode processing, `processSpilledJoin` calls the regular `ctr.probe` once and then unconditionally clears `ctr.leftBat`. This breaks the invariant used by the normal hash join probe state machine: `ctr.probe` may return early when the result batch reaches `DefaultBatchSize`, and expects the caller to invoke it again with the same `ctr.leftBat` and an advanced `ctr.lastIdx`.
### Issue Context
This causes dropped join output for large probe batches in spill mode.
### Fix Focus Areas
- pkg/sql/colexec/hashjoin/spill.go[351-394]
- pkg/sql/colexec/hashjoin/join.go[360-399]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

3. Right joins wrong in spill 🐞 Bug ✓ Correctness

Description

The spill path bypasses the SyncBitmap/Finalize phases and also reinitializes rightRowsMatched per
probe batch, so right outer/right semi/right anti joins cannot correctly emit matched/unmatched
right-side rows.

Code

pkg/sql/colexec/hashjoin/join.go[R125-143]

+			// Check if we need to process spilled data
+			if ctr.mp != nil && ctr.mp.IsSpilled() {
+				ctr.state = ProcessSpilled
+			} else {
+				ctr.state = Probe
+			}
+
+		case ProcessSpilled:
+			result.Batch, err = hashJoin.processSpilledJoin(proc, analyzer)
+			if err != nil {
+				return result, err
+			}
+			if result.Batch == nil {
+				ctr.state = End
+			}
+			return result, nil
case Probe:
if ctr.leftBat == nil {

Evidence

Normal right-join handling transitions to SyncBitmap/Finalize after probe input is exhausted to emit
right-side rows based on rightRowsMatched accumulated over all probe batches. In spill mode,
HashJoin switches to ProcessSpilled and then directly to End when buckets are done, skipping
SyncBitmap/Finalize entirely. Additionally, spill processing recreates rightRowsMatched for each
probe batch, losing match history across batches within the same bucket.

pkg/sql/colexec/hashjoin/join.go[125-156]
pkg/sql/colexec/hashjoin/spill.go[351-362]
pkg/sql/colexec/hashjoin/types.go[50-101]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Spill-mode execution short-circuits the normal `Probe -&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt; SyncBitmap -&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt; Finalize` flow that right joins rely on, and it resets `rightRowsMatched` per probe batch.
### Issue Context
Right join family operators need consistent match tracking over the entire probe stream and a finalization step to output right-side rows.
### Fix Focus Areas
- pkg/sql/colexec/hashjoin/join.go[125-156]
- pkg/sql/colexec/hashjoin/spill.go[331-395]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

View more (4)

4. Nil JoinMap panic 🐞 Bug ⛯ Reliability

Description

HashJoin.build dereferences ctr.mp even though ReceiveJoinMap can return nil when the build side
sends a nil JoinMapPtr (e.g., empty build input), which will panic at runtime.

Code

pkg/sql/colexec/hashjoin/join.go[R271-324]

if ctr.mp != nil {
ctr.maxAllocSize = max(ctr.maxAllocSize, ctr.mp.Size())
+
+		// Handle spilled build side
+		if ctr.mp.IsSpilled() {
+			ctr.spilledBuildBuckets, ctr.spilledBuildRowCnts = ctr.mp.GetSpillBuckets()
+			ctr.currentBucketIdx = 0
+
+			// Create spill files for probe side
+			spilledProbeBuckets, spillProbeFiles, err := createProbeSpillFiles(proc)
+			if err != nil {
+				return err
+			}
+			ctr.spilledProbeBuckets = spilledProbeBuckets
+
+			spillBuffers := make([]*bucketBuffer, spillNumBuckets)
+			for i := range spillBuffers {
+				spillBuffers[i] = &bucketBuffer{}
+			}
+
+			defer func() {
+				// Flush remaining buffered data
+				for i, buf := range spillBuffers {
+					flushBucketBuffer(proc, buf, spillProbeFiles[i], analyzer)
+				}
+				for _, f := range spillProbeFiles {
+					if f != nil {
+						f.Close()
+					}
+				}
+			}()
+
+			// Spill each probe batch as it arrives
+			for {
+				input, err := vm.ChildrenCall(hashJoin.GetChildren(0), proc, analyzer)
+				if err != nil {
+					return err
+				}
+				if input.Batch == nil {
+					break
+				}
+				if !input.Batch.IsEmpty() {
+					if err := appendProbeBatchToSpillFiles(proc, input.Batch, spillProbeFiles, spillBuffers, hashJoin.EqConds[0], ctr.eqCondExecs, analyzer); err != nil {
+						return err
+					}
+				}
+			}
+
+			fmt.Println("spilled probe rows:", analyzer.GetOpStats().SpillRows)
+		}
}
ctr.rightBats = ctr.mp.GetBatches()
ctr.rightRowCnt = ctr.mp.GetRowCount()

Evidence

ReceiveJoinMap explicitly returns (nil, nil) if the received message has JoinMapPtr==nil.
HashJoin.build then unconditionally calls ctr.mp.GetBatches()/GetRowCount after the spill-handling
if-block, which will panic when ctr.mp is nil.

pkg/vm/message/joinMapMsg.go[286-314]
pkg/sql/colexec/hashjoin/join.go[262-325]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`HashJoin.build` assumes `ctr.mp` is non-nil after `ReceiveJoinMap`, but `ReceiveJoinMap` may return nil.
### Issue Context
Empty build side is a valid scenario.
### Fix Focus Areas
- pkg/sql/colexec/hashjoin/join.go[262-325]
- pkg/vm/message/joinMapMsg.go[286-314]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

5. Spill JoinMap lifecycle bug 🐞 Bug ⛯ Reliability

Description

processSpilledJoin overwrites ctr.mp with ctr.bucketJoinMap and frees bucketJoinMap without clearing
ctr.mp, which can leak the original JoinMap and lead to double-free/use-after-free during cleanup.

Code

pkg/sql/colexec/hashjoin/spill.go[R331-356]

+	// Process one probe batch
+	if ctr.bucketProbeIdx >= len(ctr.bucketProbeBatches) {
+		// Done with this bucket, clean up and move to next
+		ctr.bucketJoinMap.Free()
+		ctr.bucketJoinMap = nil
+		for i := range ctr.bucketBuildBatches {
+			if ctr.bucketBuildBatches[i] != nil {
+				ctr.bucketBuildBatches[i].Clean(proc.Mp())
+			}
+		}
+		ctr.bucketBuildBatches = nil
+		for i := range ctr.bucketProbeBatches {
+			if ctr.bucketProbeBatches[i] != nil {
+				ctr.bucketProbeBatches[i].Clean(proc.Mp())
+			}
+		}
+		ctr.bucketProbeBatches = nil
+		return hashJoin.processSpilledJoin(proc, analyzer)
+	}
+
+	// Set up container state for regular probe
+	ctr.leftBat = ctr.bucketProbeBatches[ctr.bucketProbeIdx]
+	ctr.bucketProbeIdx++
+	ctr.mp = ctr.bucketJoinMap
+	ctr.rightBats = ctr.mp.GetBatches()
+	ctr.rightRowCnt = ctr.mp.GetRowCount()

Evidence

In spill processing, ctr.mp is repointed to bucketJoinMap, then bucketJoinMap is freed when the
bucket completes. ctr.mp is not reset, and later cleanup calls ctr.mp.Free() again. Additionally,
the original JoinMap received in build() is never freed once ctr.mp is overwritten.

pkg/sql/colexec/hashjoin/spill.go[332-356]
pkg/sql/colexec/hashjoin/types.go[264-268]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Spill-mode code overwrites `ctr.mp` with the bucket JoinMap, then frees the bucket JoinMap without clearing `ctr.mp`, which can trigger double-free during cleanup and leaks the originally received JoinMap.
### Issue Context
`ctr.mp` is treated as the primary JoinMap lifetime-managed by `cleanHashMap()`.
### Fix Focus Areas
- pkg/sql/colexec/hashjoin/spill.go[331-356]
- pkg/sql/colexec/hashjoin/types.go[264-268]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

6. Orphan spill files on error 🐞 Bug ⛯ Reliability

Description

HashBuild only records spilled bucket names into ctr.spilledBuckets at the end of build(); if an
error occurs after creating/writing spill files but before that assignment, Reset/Free cleanup
cannot delete those files, leaking disk space.

Code

pkg/sql/colexec/hashbuild/build.go[R175-223]

+		// Check if we should enter spill mode based on batch memory size
+		if hashBuild.NeedHashMap && hashBuild.shouldSpillBatches() {
+			spillMode = true
+			// Create spill files once
+			spilledBuckets, spillFiles, err = createSpillFiles(proc)
+			if err != nil {
+				return err
+			}
+			spilledRowCnts = make([]int64, spillNumBuckets)
+			spillBuffers = make([]*bucketBuffer, spillNumBuckets)
+			for i := range spillBuffers {
+				spillBuffers[i] = &bucketBuffer{}
+			}
+
+			// Spill all batches collected so far
+			for _, bat := range ctr.hashmapBuilder.Batches.Buf {
+				rowCnts, err := appendBatchToSpillFiles(proc, bat, spillFiles, spillBuffers, hashBuild.HashOnPK, hashBuild.Conditions, analyzer)
+				if err != nil {
+					return err
+				}
+				for i := range rowCnts {
+					spilledRowCnts[i] += rowCnts[i]
+				}
+			}
+			// Clear batches to save memory
+			ctr.hashmapBuilder.Batches.Clean(proc.Mp())
+		}
}
-	return nil
-}
-func (ctr *container) build(hashBuild *HashBuild, proc *process.Process, analyzer process.Analyzer) error {
-	err := ctr.collectBuildBatches(hashBuild, proc, analyzer)
-	if err != nil {
-		return err
+	// Flush remaining buffered data and close spill files
+	if spillMode {
+		fmt.Println("spilled build rows:", analyzer.GetOpStats().SpillRows)
+		for i, buf := range spillBuffers {
+			if _, err := flushBucketBuffer(proc, buf, spillFiles[i], analyzer); err != nil {
+				return err
+			}
+		}
+		for _, f := range spillFiles {
+			if f != nil {
+				f.Close()
+			}
+		}
+	}
+
+	// Store spill info in container
+	if spillMode {
+		ctr.spilledBuckets = spilledBuckets
+		ctr.spilledRowCnts = spilledRowCnts
}

Evidence

cleanupSpillFiles deletes only hashBuild.ctr.spilledBuckets. In build(), spilledBuckets is a local
variable until the very end; errors during append/flush will return early, leaving created spill
files not referenced by ctr.spilledBuckets and thus not deleted.

pkg/sql/colexec/hashbuild/build.go[175-223]
pkg/sql/colexec/hashbuild/types.go[120-131]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Spill files created during HashBuild can be leaked on error because their names are only persisted to `ctr.spilledBuckets` at the end of the build.
### Issue Context
`cleanupSpillFiles` only considers `ctr.spilledBuckets`.
### Fix Focus Areas
- pkg/sql/colexec/hashbuild/build.go[175-223]
- pkg/sql/colexec/hashbuild/types.go[120-131]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

7. Spill flush errors ignored 🐞 Bug ⛯ Reliability

Description

In HashJoin.build spill mode, deferred flushing of probe spill buffers ignores flush errors, so
write failures can be silently swallowed and later reads may fail or produce corrupt results.

Code

pkg/sql/colexec/hashjoin/join.go[R292-302]

+			defer func() {
+				// Flush remaining buffered data
+				for i, buf := range spillBuffers {
+					flushBucketBuffer(proc, buf, spillProbeFiles[i], analyzer)
+				}
+				for _, f := range spillProbeFiles {
+					if f != nil {
+						f.Close()
+					}
+				}
+			}()

Evidence

The deferred cleanup calls flushBucketBuffer but drops the returned error. If disk is full or the
spill file service fails, the operator proceeds as if spill succeeded, leaving inconsistent spill
files.

pkg/sql/colexec/hashjoin/join.go[292-301]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Deferred spill buffer flushing ignores errors, causing silent data loss/corruption.
### Issue Context
`flushBucketBuffer` returns an error which must be handled.
### Fix Focus Areas
- pkg/sql/colexec/hashjoin/join.go[292-302]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

8. constructHashJoin missets SpillThreshold ☑ 📎 Requirement gap ⛯ Reliability

Description

constructHashJoin() computes a default spill budget (mem) but assigns `arg.SpillThreshold =
node.SpillMem instead, which is typically 0` when unset. This can unintentionally disable budget
enforcement/spill behavior for the hash join operator when defaults are expected to apply.

Code

pkg/sql/compile/operator.go[R946-952]

+	// Get threshold from build side (use same as hash aggregate default)
+	mem := int64(system.MemoryTotal()) / int64(system.GoMaxProcs()) / 8
+	if node.SpillMem > 0 && mem > node.SpillMem {
+		mem = node.SpillMem
+	}
+	arg.SpillThreshold = node.SpillMem
+

Evidence
PR Compliance ID 1 requires enforcing a memory budget for hash join. In constructHashJoin(), the
code computes a default budget (mem) but then ignores it and stores node.SpillMem into
SpillThreshold, which can be 0 and therefore not represent an enforced budget.
Hash join and hash aggregation track memory usage and spill when exceeding budget
pkg/sql/compile/operator.go[946-952]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`constructHashJoin()` computes a default spill budget (`mem`) but does not apply it to `arg.SpillThreshold`, potentially leaving the threshold at `0`.
## Issue Context
To enforce a memory budget for hash join, the operator should carry a non-zero default/capped threshold when `join_spill_mem` is not explicitly set.
## Fix Focus Areas
- pkg/sql/compile/operator.go[946-952] მთ

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

9. ~~Stdout debug prints~~ ☑ 🐞 Bug ✓ Correctness

Description

fmt.Println is used in HashBuild and HashJoin spill paths, which writes to stdout in production and
bypasses structured logging/verbosity controls.

Code

pkg/sql/colexec/hashbuild/build.go[R204-207]

+	// Flush remaining buffered data and close spill files
+	if spillMode {
+		fmt.Println("spilled build rows:", analyzer.GetOpStats().SpillRows)
+		for i, buf := range spillBuffers {

Evidence
These prints execute whenever spillMode triggers and can add latency and noise in server
environments. They also force fmt imports into critical operators.
pkg/sql/colexec/hashbuild/build.go[204-207]
pkg/sql/colexec/hashjoin/join.go[320-321]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Spill implementation uses `fmt.Println` in operator hot paths.
### Issue Context
Operators should use structured logging or analyzer stats, not stdout.
### Fix Focus Areas
- pkg/sql/colexec/hashbuild/build.go[204-207]
- pkg/sql/colexec/hashjoin/join.go[320-321]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

10. Bucket rebuild doubles memory 🐞 Bug ➹ Performance

Description

rebuildHashmapForBucket duplicates already-loaded build batches into builder.Batches via
CopyIntoBatches, potentially doubling memory usage per bucket and undermining the purpose of
spilling.

Code

pkg/sql/colexec/hashjoin/spill.go[R461-482]

+func (hashJoin *HashJoin) rebuildHashmapForBucket(proc *process.Process, buildBatches []*batch.Batch) (*message.JoinMap, error) {
+	// Create a temporary hashmap builder
+	builder := &hashbuild.HashmapBuilder{}
+	if err := builder.Prepare(hashJoin.EqConds[1], -1, proc); err != nil {
+		return nil, err
+	}
+
+	// Copy batches into builder
+	for _, bat := range buildBatches {
+		if err := builder.Batches.CopyIntoBatches(bat, proc); err != nil {
+			return nil, err
+		}
+		builder.InputBatchRowCount += bat.RowCount()
+	}
+
+	// Build hashmap
+	if err := builder.BuildHashmap(hashJoin.HashOnPK, true, false, proc); err != nil {
+		return nil, err
+	}
+
+	return message.NewJoinMap(builder.MultiSels, builder.IntHashMap, builder.StrHashMap, nil, builder.Batches.Buf, proc.Mp()), nil
+}

Evidence

The spilled build bucket is loaded from disk into memory (buildBatches). The rebuild path then
copies those batches again into builder.Batches before building the hashmap, causing an extra full
copy of the bucket data.

pkg/sql/colexec/hashjoin/spill.go[461-475]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Per-bucket hash rebuild copies build batches again, increasing memory pressure.
### Issue Context
Spilling is meant to reduce peak memory; doubling bucket memory can still OOM.
### Fix Focus Areas
- pkg/sql/colexec/hashjoin/spill.go[461-475]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

ⓘ The new review experience is currently in Beta. Learn more

This reverts commit 5244e76.

aunjgr requested review from XuPeng-SH and ouyuanning as code owners March 3, 2026 11:17

aunjgr had a problem deploying to ci March 3, 2026 11:17 — with GitHub Actions Error

aunjgr temporarily deployed to ci March 3, 2026 11:17 — with GitHub Actions Inactive

aunjgr had a problem deploying to ci March 3, 2026 11:17 — with GitHub Actions Error

aunjgr temporarily deployed to ci March 3, 2026 11:17 — with GitHub Actions Inactive

matrix-meow added the size/XL Denotes a PR that changes [1000, 1999] lines label Mar 3, 2026

aunjgr force-pushed the spill_join branch from 9736c70 to 0dbd2ff Compare March 3, 2026 11:18

mergify bot added the kind/feature label Mar 3, 2026

aunjgr temporarily deployed to ci March 3, 2026 11:23 — with GitHub Actions Inactive

aunjgr had a problem deploying to ci March 3, 2026 11:23 — with GitHub Actions Failure

aunjgr temporarily deployed to ci March 3, 2026 11:23 — with GitHub Actions Inactive

fix sca

ae628c9

aunjgr temporarily deployed to ci March 3, 2026 16:28 — with GitHub Actions Inactive

aunjgr had a problem deploying to ci March 3, 2026 16:28 — with GitHub Actions Error

aunjgr temporarily deployed to ci March 3, 2026 16:28 — with GitHub Actions Inactive

aunjgr had a problem deploying to ci March 3, 2026 16:28 — with GitHub Actions Error

add bvt

1380616

aunjgr requested a review from heni02 as a code owner March 3, 2026 16:38

aunjgr temporarily deployed to ci March 3, 2026 16:39 — with GitHub Actions Inactive

aunjgr had a problem deploying to ci March 3, 2026 16:39 — with GitHub Actions Failure

aunjgr had a problem deploying to ci March 3, 2026 16:39 — with GitHub Actions Error

aunjgr had a problem deploying to ci March 3, 2026 16:39 — with GitHub Actions Failure

aunjgr temporarily deployed to ci March 3, 2026 16:39 — with GitHub Actions Inactive

aunjgr had a problem deploying to ci March 3, 2026 16:39 — with GitHub Actions Failure

aunjgr added 8 commits March 4, 2026 10:23

fix threshold

c3f7821

fix threshold

ac383b9

enable spill for broadcast join

5244e76

Revert "enable spill for broadcast join"

5a99a89

This reverts commit 5244e76.

Merge branch 'main' into spill_join

d43c69e

cleanup code

7705508

fix ut

5b471cb

minor

7c05533

heni02 approved these changes Mar 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: implement hash join spill for memory pressure handling#23795

feat: implement hash join spill for memory pressure handling#23795
aunjgr wants to merge 13 commits intomatrixorigin:mainfrom
aunjgr:spill_join

aunjgr commented Mar 3, 2026

Uh oh!

qodo-code-review bot commented Mar 3, 2026

Uh oh!

qodo-code-review bot commented Mar 3, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

aunjgr commented Mar 3, 2026

What type of PR is this?

Which issue(s) this PR fixes:

What this PR does / why we need it:

Uh oh!

qodo-code-review bot commented Mar 3, 2026

Review Summary by Qodo

Walkthroughs

File Changes

Uh oh!

qodo-code-review bot commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review by Qodo

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

qodo-code-review bot commented Mar 3, 2026 •

edited

Loading