Skip to content

feat: implement hash join spill for memory pressure handling#23795

Open
aunjgr wants to merge 13 commits intomatrixorigin:mainfrom
aunjgr:spill_join
Open

feat: implement hash join spill for memory pressure handling#23795
aunjgr wants to merge 13 commits intomatrixorigin:mainfrom
aunjgr:spill_join

Conversation

@aunjgr
Copy link
Contributor

@aunjgr aunjgr commented Mar 3, 2026

What type of PR is this?

  • API-change
  • BUG
  • Improvement
  • Documentation
  • Feature
  • Test and CI
  • Code Refactoring

Which issue(s) this PR fixes:

issue #3433 #23353

What this PR does / why we need it:

Implement spill-to-disk functionality for hash join operations to handle memory pressure by partitioning data into 32 buckets and writing to disk when memory threshold is exceeded.

Key changes:

  • Add SpillThreshold configuration to HashBuild and HashJoin operators
  • Implement xxhash-based partitioning for consistent bucket distribution
  • Add buffering (8192 rows) before flushing to disk to reduce I/O overhead
  • Reuse existing probe logic for processing spilled buckets
  • Track spill statistics (SpillSize, SpillRows) in operator analyzer
  • Properly clean up spill files and batches to prevent memory leaks

Implementation details:

  • HashBuild spills all batches when threshold exceeded, sends empty JoinMap with spill metadata to HashJoin
  • HashJoin creates matching probe spill files and partitions probe batches
  • After normal join completes, processes spilled buckets one at a time by rebuilding hashmap and probing in memory
  • Uses consistent xxhash computation across build and probe sides
  • Spill files use format: [count][size][batch_data][magic] for validation

Performance optimizations:

  • Buffer 8192 rows per bucket before flushing to reduce syscalls
  • Reuse probe logic to avoid code duplication
  • Clean up batches immediately after processing to minimize memory usage

This enables hash joins to handle datasets larger than available memory without OOM errors, with graceful degradation to disk-based processing.

Implement spill-to-disk functionality for hash join operations to handle
memory pressure by partitioning data into 32 buckets and writing to disk
when memory threshold is exceeded.

Key changes:
- Add SpillThreshold configuration to HashBuild and HashJoin operators
- Implement xxhash-based partitioning for consistent bucket distribution
- Add buffering (8192 rows) before flushing to disk to reduce I/O overhead
- Reuse existing probe logic for processing spilled buckets
- Track spill statistics (SpillSize, SpillRows) in operator analyzer
- Properly clean up spill files and batches to prevent memory leaks

Implementation details:
- HashBuild spills all batches when threshold exceeded, sends empty JoinMap
  with spill metadata to HashJoin
- HashJoin creates matching probe spill files and partitions probe batches
- After normal join completes, processes spilled buckets one at a time by
  rebuilding hashmap and probing in memory
- Uses consistent xxhash computation across build and probe sides
- Spill files use format: [count][size][batch_data][magic] for validation

Performance optimizations:
- Buffer 8192 rows per bucket before flushing to reduce syscalls
- Reuse probe logic to avoid code duplication
- Clean up batches immediately after processing to minimize memory usage

This enables hash joins to handle datasets larger than available memory
without OOM errors, with graceful degradation to disk-based processing.
@qodo-code-review
Copy link

Review Summary by Qodo

Implement hash join spill-to-disk for memory pressure handling

✨ Enhancement

Grey Divider

Walkthroughs

Description
• Implement spill-to-disk for hash join operations handling memory pressure
• Partition data into 32 buckets using xxhash with 8192-row buffering
• Process spilled buckets by rebuilding hashmap and probing in memory
• Track spill statistics (SpillSize, SpillRows) in operator analyzer
• Add join_spill_mem system variable for configurable memory threshold
Diagram
flowchart LR
  A["Build Phase"] -->|"Memory Threshold Exceeded"| B["Create 32 Spill Buckets"]
  B -->|"Partition by xxhash"| C["Write to Disk with Buffering"]
  C -->|"Send Empty JoinMap"| D["HashJoin Receives Spill Info"]
  D -->|"Partition Probe Batches"| E["Create Probe Spill Files"]
  E -->|"Process Each Bucket"| F["Rebuild Hashmap + Probe"]
  F -->|"Output Results"| G["Final Join Output"]
Loading

Grey Divider

File Changes

1. pkg/sql/colexec/hashbuild/spill.go ✨ Enhancement +320/-0

Implement build-side spill-to-disk functionality

pkg/sql/colexec/hashbuild/spill.go


2. pkg/sql/colexec/hashjoin/spill.go ✨ Enhancement +483/-0

Implement probe-side spill and bucket processing

pkg/sql/colexec/hashjoin/spill.go


3. pkg/sql/colexec/hashbuild/hashmap.go ✨ Enhancement +202/-1

Add incremental hashmap building and finalization methods

pkg/sql/colexec/hashbuild/hashmap.go


View more (20)
4. pkg/sql/colexec/hashbuild/build.go ✨ Enhancement +96/-12

Integrate spill logic into build phase with threshold detection

pkg/sql/colexec/hashbuild/build.go


5. pkg/sql/colexec/hashbuild/types.go ✨ Enhancement +21/-2

Add spill state tracking and SpillThreshold configuration

pkg/sql/colexec/hashbuild/types.go


6. pkg/sql/colexec/hashjoin/types.go ✨ Enhancement +52/-0

Add spill bucket tracking and ProcessSpilled state

pkg/sql/colexec/hashjoin/types.go


7. pkg/sql/colexec/hashjoin/join.go ✨ Enhancement +65/-1

Handle spilled join processing and probe-side spilling

pkg/sql/colexec/hashjoin/join.go


8. pkg/vm/message/joinMapMsg.go ✨ Enhancement +35/-7

Add spill metadata to JoinMap message structure

pkg/vm/message/joinMapMsg.go


9. pkg/vm/process/operator_analyzer.go ✨ Enhancement +22/-8

Add SpillRows tracking and fix negative duration handling

pkg/vm/process/operator_analyzer.go


10. pkg/sql/compile/operator.go ✨ Enhancement +20/-1

Set SpillThreshold for hash join and build operators

pkg/sql/compile/operator.go


11. pkg/sql/plan/query_builder.go ✨ Enhancement +12/-0

Add join_spill_mem variable resolution and propagation

pkg/sql/plan/query_builder.go


12. pkg/sql/plan/types.go ✨ Enhancement +3/-0

Add joinSpillMem field to QueryBuilder

pkg/sql/plan/types.go


13. pkg/frontend/variables.go ⚙️ Configuration changes +8/-0

Define join_spill_mem system variable configuration

pkg/frontend/variables.go


14. pkg/sql/plan/join_order.go ✨ Enhancement +4/-0

Propagate SpillMem to join nodes in query planning

pkg/sql/plan/join_order.go


15. pkg/sql/plan/flatten_subquery.go ✨ Enhancement +2/-0

Set SpillMem for join nodes in subquery flattening

pkg/sql/plan/flatten_subquery.go


16. pkg/sql/compile/compile.go ✨ Enhancement +2/-2

Pass plan node to constructShuffleHashBuild for spill config

pkg/sql/compile/compile.go


17. pkg/sql/compile/analyze_module.go ✨ Enhancement +1/-0

Track SpillRows in node analysis information

pkg/sql/compile/analyze_module.go


18. pkg/sql/plan/explain/explain_node.go 📝 Documentation +2/-1

Display SpillRows in query explain output

pkg/sql/plan/explain/explain_node.go


19. pkg/sql/colexec/hashbuild/hashmap_test.go Miscellaneous +1/-16

Update package name from hashmap_util to hashbuild

pkg/sql/colexec/hashbuild/hashmap_test.go


20. proto/plan.proto ⚙️ Configuration changes +8/-7

Add spill_rows field to AnalyzeInfo message

proto/plan.proto


21. go.mod Dependencies +2/-2

Update bytedance/sonic to v1.15.0

go.mod


22. go.sum Dependencies +4/-4

Update bytedance/sonic and loader checksums

go.sum


23. pkg/pb/plan/plan.pb.go Additional files +637/-600

...

pkg/pb/plan/plan.pb.go


Grey Divider

Qodo Logo

@qodo-code-review
Copy link

qodo-code-review bot commented Mar 3, 2026

Code Review by Qodo

🐞 Bugs (8) 📘 Rule violations (0) 📎 Requirement gaps (2)

Grey Divider


Action required

1. shouldSpillBatches blocks broadcast spill📎 Requirement gap ⛯ Reliability
Description
The new spill decision explicitly returns false when hashBuild.IsShuffle is false, so
non-shuffle/broadcast hash joins can exceed the memory budget without spilling. This can lead to
unbounded memory growth and potential OOMs instead of spilling-to-disk.
Code

pkg/sql/colexec/hashbuild/spill.go[R186-189]

+func (hashBuild *HashBuild) shouldSpillBatches() bool {
+	if !hashBuild.IsShuffle || hashBuild.SpillThreshold <= 0 {
+		return false
+	}
Evidence
PR Compliance ID 1 requires hash join to enforce a memory budget by spilling when usage exceeds the
budget. The added shouldSpillBatches() logic disables spilling unless IsShuffle is enabled,
while the compiler constructs broadcast hash-build operators with IsShuffle = false, meaning those
hash joins will never spill even under memory pressure.

Hash join and hash aggregation track memory usage and spill when exceeding budget
pkg/sql/colexec/hashbuild/spill.go[186-189]
pkg/sql/compile/operator.go[1549-1552]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Spilling is currently disabled unless `hashBuild.IsShuffle` is true, which prevents broadcast/non-shuffle hash joins from spilling when they exceed the memory budget.
## Issue Context
Compliance requires hash join to enforce a memory budget by spilling-to-disk when memory usage exceeds the configured/allowed budget. The compiler creates broadcast hash-build operators with `IsShuffle = false`, so the current gating means those operators will not spill.
## Fix Focus Areas
- pkg/sql/colexec/hashbuild/spill.go[186-195]
- pkg/sql/compile/operator.go[1549-1560]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


2. Spill drops remaining rows 🐞 Bug ✓ Correctness
Description
processSpilledJoin clears ctr.leftBat after a single ctr.probe call, even though ctr.probe can
return early when the result batch fills; this loses the remaining portion of the current probe
batch and produces incorrect join output.
Code

pkg/sql/colexec/hashjoin/spill.go[R351-394]

+	// Set up container state for regular probe
+	ctr.leftBat = ctr.bucketProbeBatches[ctr.bucketProbeIdx]
+	ctr.bucketProbeIdx++
+	ctr.mp = ctr.bucketJoinMap
+	ctr.rightBats = ctr.mp.GetBatches()
+	ctr.rightRowCnt = ctr.mp.GetRowCount()
+	if hashJoin.IsRightJoin {
+		if ctr.rightRowCnt > 0 {
+			ctr.rightRowsMatched = &bitmap.Bitmap{}
+			ctr.rightRowsMatched.InitWithSize(ctr.rightRowCnt)
+		}
+	}
+
+	// Reset iterator for new probe batch
+	ctr.itr = nil
+
+	// Prepare fresh result batch
+	ctr.resBat = batch.NewWithSize(len(hashJoin.ResultCols))
+	for i, rp := range hashJoin.ResultCols {
+		if rp.Rel == 0 {
+			ctr.resBat.Vecs[i] = vector.NewVec(hashJoin.LeftTypes[rp.Pos])
+		} else {
+			ctr.resBat.Vecs[i] = vector.NewVec(hashJoin.RightTypes[rp.Pos])
+		}
+	}
+
+	// Reset probe state
+	ctr.lastIdx = 0
+	ctr.probeState = psNextBatch
+
+	// Use regular probe logic
+	var result vm.CallResult
+	if err := ctr.probe(hashJoin, proc, &result); err != nil {
+		return nil, err
+	}
+
+	// Clean up for next call
+	ctr.leftBat = nil
+
+	if result.Batch == nil || result.Batch.RowCount() == 0 {
+		return hashJoin.processSpilledJoin(proc, analyzer)
+	}
+
+	return result.Batch, nil
Evidence
In the normal probe loop, ctr.probe may return after filling one output batch and expects
ctr.leftBat/ctr.lastIdx to remain set so the next Call() continues from the same input batch. In
spill mode, processSpilledJoin always sets ctr.leftBat=nil after ctr.probe returns, discarding
continuation state and skipping remaining rows.

pkg/sql/colexec/hashjoin/spill.go[351-395]
pkg/sql/colexec/hashjoin/join.go[360-399]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
In spill-mode processing, `processSpilledJoin` calls the regular `ctr.probe` once and then unconditionally clears `ctr.leftBat`. This breaks the invariant used by the normal hash join probe state machine: `ctr.probe` may return early when the result batch reaches `DefaultBatchSize`, and expects the caller to invoke it again with the same `ctr.leftBat` and an advanced `ctr.lastIdx`.
### Issue Context
This causes dropped join output for large probe batches in spill mode.
### Fix Focus Areas
- pkg/sql/colexec/hashjoin/spill.go[351-394]
- pkg/sql/colexec/hashjoin/join.go[360-399]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. Right joins wrong in spill 🐞 Bug ✓ Correctness
Description
The spill path bypasses the SyncBitmap/Finalize phases and also reinitializes rightRowsMatched per
probe batch, so right outer/right semi/right anti joins cannot correctly emit matched/unmatched
right-side rows.
Code

pkg/sql/colexec/hashjoin/join.go[R125-143]

+			// Check if we need to process spilled data
+			if ctr.mp != nil && ctr.mp.IsSpilled() {
+				ctr.state = ProcessSpilled
+			} else {
+				ctr.state = Probe
+			}
+
+		case ProcessSpilled:
+			result.Batch, err = hashJoin.processSpilledJoin(proc, analyzer)
+			if err != nil {
+				return result, err
+			}
+			if result.Batch == nil {
+				ctr.state = End
+			}
+			return result, nil
case Probe:
if ctr.leftBat == nil {
Evidence
Normal right-join handling transitions to SyncBitmap/Finalize after probe input is exhausted to emit
right-side rows based on rightRowsMatched accumulated over all probe batches. In spill mode,
HashJoin switches to ProcessSpilled and then directly to End when buckets are done, skipping
SyncBitmap/Finalize entirely. Additionally, spill processing recreates rightRowsMatched for each
probe batch, losing match history across batches within the same bucket.

pkg/sql/colexec/hashjoin/join.go[125-156]
pkg/sql/colexec/hashjoin/spill.go[351-362]
pkg/sql/colexec/hashjoin/types.go[50-101]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Spill-mode execution short-circuits the normal `Probe -&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt; SyncBitmap -&amp;amp;amp;amp;amp;amp;amp;amp;amp;gt; Finalize` flow that right joins rely on, and it resets `rightRowsMatched` per probe batch.
### Issue Context
Right join family operators need consistent match tracking over the entire probe stream and a finalization step to output right-side rows.
### Fix Focus Areas
- pkg/sql/colexec/hashjoin/join.go[125-156]
- pkg/sql/colexec/hashjoin/spill.go[331-395]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


View more (4)
4. Nil JoinMap panic 🐞 Bug ⛯ Reliability
Description
HashJoin.build dereferences ctr.mp even though ReceiveJoinMap can return nil when the build side
sends a nil JoinMapPtr (e.g., empty build input), which will panic at runtime.
Code

pkg/sql/colexec/hashjoin/join.go[R271-324]

if ctr.mp != nil {
ctr.maxAllocSize = max(ctr.maxAllocSize, ctr.mp.Size())
+
+		// Handle spilled build side
+		if ctr.mp.IsSpilled() {
+			ctr.spilledBuildBuckets, ctr.spilledBuildRowCnts = ctr.mp.GetSpillBuckets()
+			ctr.currentBucketIdx = 0
+
+			// Create spill files for probe side
+			spilledProbeBuckets, spillProbeFiles, err := createProbeSpillFiles(proc)
+			if err != nil {
+				return err
+			}
+			ctr.spilledProbeBuckets = spilledProbeBuckets
+
+			spillBuffers := make([]*bucketBuffer, spillNumBuckets)
+			for i := range spillBuffers {
+				spillBuffers[i] = &bucketBuffer{}
+			}
+
+			defer func() {
+				// Flush remaining buffered data
+				for i, buf := range spillBuffers {
+					flushBucketBuffer(proc, buf, spillProbeFiles[i], analyzer)
+				}
+				for _, f := range spillProbeFiles {
+					if f != nil {
+						f.Close()
+					}
+				}
+			}()
+
+			// Spill each probe batch as it arrives
+			for {
+				input, err := vm.ChildrenCall(hashJoin.GetChildren(0), proc, analyzer)
+				if err != nil {
+					return err
+				}
+				if input.Batch == nil {
+					break
+				}
+				if !input.Batch.IsEmpty() {
+					if err := appendProbeBatchToSpillFiles(proc, input.Batch, spillProbeFiles, spillBuffers, hashJoin.EqConds[0], ctr.eqCondExecs, analyzer); err != nil {
+						return err
+					}
+				}
+			}
+
+			fmt.Println("spilled probe rows:", analyzer.GetOpStats().SpillRows)
+		}
}
ctr.rightBats = ctr.mp.GetBatches()
ctr.rightRowCnt = ctr.mp.GetRowCount()
Evidence
ReceiveJoinMap explicitly returns (nil, nil) if the received message has JoinMapPtr==nil.
HashJoin.build then unconditionally calls ctr.mp.GetBatches()/GetRowCount after the spill-handling
if-block, which will panic when ctr.mp is nil.

pkg/vm/message/joinMapMsg.go[286-314]
pkg/sql/colexec/hashjoin/join.go[262-325]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`HashJoin.build` assumes `ctr.mp` is non-nil after `ReceiveJoinMap`, but `ReceiveJoinMap` may return nil.
### Issue Context
Empty build side is a valid scenario.
### Fix Focus Areas
- pkg/sql/colexec/hashjoin/join.go[262-325]
- pkg/vm/message/joinMapMsg.go[286-314]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


5. Spill JoinMap lifecycle bug 🐞 Bug ⛯ Reliability
Description
processSpilledJoin overwrites ctr.mp with ctr.bucketJoinMap and frees bucketJoinMap without clearing
ctr.mp, which can leak the original JoinMap and lead to double-free/use-after-free during cleanup.
Code

pkg/sql/colexec/hashjoin/spill.go[R331-356]

+	// Process one probe batch
+	if ctr.bucketProbeIdx >= len(ctr.bucketProbeBatches) {
+		// Done with this bucket, clean up and move to next
+		ctr.bucketJoinMap.Free()
+		ctr.bucketJoinMap = nil
+		for i := range ctr.bucketBuildBatches {
+			if ctr.bucketBuildBatches[i] != nil {
+				ctr.bucketBuildBatches[i].Clean(proc.Mp())
+			}
+		}
+		ctr.bucketBuildBatches = nil
+		for i := range ctr.bucketProbeBatches {
+			if ctr.bucketProbeBatches[i] != nil {
+				ctr.bucketProbeBatches[i].Clean(proc.Mp())
+			}
+		}
+		ctr.bucketProbeBatches = nil
+		return hashJoin.processSpilledJoin(proc, analyzer)
+	}
+
+	// Set up container state for regular probe
+	ctr.leftBat = ctr.bucketProbeBatches[ctr.bucketProbeIdx]
+	ctr.bucketProbeIdx++
+	ctr.mp = ctr.bucketJoinMap
+	ctr.rightBats = ctr.mp.GetBatches()
+	ctr.rightRowCnt = ctr.mp.GetRowCount()
Evidence
In spill processing, ctr.mp is repointed to bucketJoinMap, then bucketJoinMap is freed when the
bucket completes. ctr.mp is not reset, and later cleanup calls ctr.mp.Free() again. Additionally,
the original JoinMap received in build() is never freed once ctr.mp is overwritten.

pkg/sql/colexec/hashjoin/spill.go[332-356]
pkg/sql/colexec/hashjoin/types.go[264-268]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Spill-mode code overwrites `ctr.mp` with the bucket JoinMap, then frees the bucket JoinMap without clearing `ctr.mp`, which can trigger double-free during cleanup and leaks the originally received JoinMap.
### Issue Context
`ctr.mp` is treated as the primary JoinMap lifetime-managed by `cleanHashMap()`.
### Fix Focus Areas
- pkg/sql/colexec/hashjoin/spill.go[331-356]
- pkg/sql/colexec/hashjoin/types.go[264-268]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


6. Orphan spill files on error 🐞 Bug ⛯ Reliability
Description
HashBuild only records spilled bucket names into ctr.spilledBuckets at the end of build(); if an
error occurs after creating/writing spill files but before that assignment, Reset/Free cleanup
cannot delete those files, leaking disk space.
Code

pkg/sql/colexec/hashbuild/build.go[R175-223]

+		// Check if we should enter spill mode based on batch memory size
+		if hashBuild.NeedHashMap && hashBuild.shouldSpillBatches() {
+			spillMode = true
+			// Create spill files once
+			spilledBuckets, spillFiles, err = createSpillFiles(proc)
+			if err != nil {
+				return err
+			}
+			spilledRowCnts = make([]int64, spillNumBuckets)
+			spillBuffers = make([]*bucketBuffer, spillNumBuckets)
+			for i := range spillBuffers {
+				spillBuffers[i] = &bucketBuffer{}
+			}
+
+			// Spill all batches collected so far
+			for _, bat := range ctr.hashmapBuilder.Batches.Buf {
+				rowCnts, err := appendBatchToSpillFiles(proc, bat, spillFiles, spillBuffers, hashBuild.HashOnPK, hashBuild.Conditions, analyzer)
+				if err != nil {
+					return err
+				}
+				for i := range rowCnts {
+					spilledRowCnts[i] += rowCnts[i]
+				}
+			}
+			// Clear batches to save memory
+			ctr.hashmapBuilder.Batches.Clean(proc.Mp())
+		}
}
-	return nil
-}
-func (ctr *container) build(hashBuild *HashBuild, proc *process.Process, analyzer process.Analyzer) error {
-	err := ctr.collectBuildBatches(hashBuild, proc, analyzer)
-	if err != nil {
-		return err
+	// Flush remaining buffered data and close spill files
+	if spillMode {
+		fmt.Println("spilled build rows:", analyzer.GetOpStats().SpillRows)
+		for i, buf := range spillBuffers {
+			if _, err := flushBucketBuffer(proc, buf, spillFiles[i], analyzer); err != nil {
+				return err
+			}
+		}
+		for _, f := range spillFiles {
+			if f != nil {
+				f.Close()
+			}
+		}
+	}
+
+	// Store spill info in container
+	if spillMode {
+		ctr.spilledBuckets = spilledBuckets
+		ctr.spilledRowCnts = spilledRowCnts
}
Evidence
cleanupSpillFiles deletes only hashBuild.ctr.spilledBuckets. In build(), spilledBuckets is a local
variable until the very end; errors during append/flush will return early, leaving created spill
files not referenced by ctr.spilledBuckets and thus not deleted.

pkg/sql/colexec/hashbuild/build.go[175-223]
pkg/sql/colexec/hashbuild/types.go[120-131]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Spill files created during HashBuild can be leaked on error because their names are only persisted to `ctr.spilledBuckets` at the end of the build.
### Issue Context
`cleanupSpillFiles` only considers `ctr.spilledBuckets`.
### Fix Focus Areas
- pkg/sql/colexec/hashbuild/build.go[175-223]
- pkg/sql/colexec/hashbuild/types.go[120-131]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


7. Spill flush errors ignored 🐞 Bug ⛯ Reliability
Description
In HashJoin.build spill mode, deferred flushing of probe spill buffers ignores flush errors, so
write failures can be silently swallowed and later reads may fail or produce corrupt results.
Code

pkg/sql/colexec/hashjoin/join.go[R292-302]

+			defer func() {
+				// Flush remaining buffered data
+				for i, buf := range spillBuffers {
+					flushBucketBuffer(proc, buf, spillProbeFiles[i], analyzer)
+				}
+				for _, f := range spillProbeFiles {
+					if f != nil {
+						f.Close()
+					}
+				}
+			}()
Evidence
The deferred cleanup calls flushBucketBuffer but drops the returned error. If disk is full or the
spill file service fails, the operator proceeds as if spill succeeded, leaving inconsistent spill
files.

pkg/sql/colexec/hashjoin/join.go[292-301]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Deferred spill buffer flushing ignores errors, causing silent data loss/corruption.
### Issue Context
`flushBucketBuffer` returns an error which must be handled.
### Fix Focus Areas
- pkg/sql/colexec/hashjoin/join.go[292-302]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

8. constructHashJoin missets SpillThreshold📎 Requirement gap ⛯ Reliability
Description
constructHashJoin() computes a default spill budget (mem) but assigns `arg.SpillThreshold =
node.SpillMem instead, which is typically 0` when unset. This can unintentionally disable budget
enforcement/spill behavior for the hash join operator when defaults are expected to apply.
Code

pkg/sql/compile/operator.go[R946-952]

+	// Get threshold from build side (use same as hash aggregate default)
+	mem := int64(system.MemoryTotal()) / int64(system.GoMaxProcs()) / 8
+	if node.SpillMem > 0 && mem > node.SpillMem {
+		mem = node.SpillMem
+	}
+	arg.SpillThreshold = node.SpillMem
+
Evidence
PR Compliance ID 1 requires enforcing a memory budget for hash join. In constructHashJoin(), the
code computes a default budget (mem) but then ignores it and stores node.SpillMem into
SpillThreshold, which can be 0 and therefore not represent an enforced budget.

Hash join and hash aggregation track memory usage and spill when exceeding budget
pkg/sql/compile/operator.go[946-952]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`constructHashJoin()` computes a default spill budget (`mem`) but does not apply it to `arg.SpillThreshold`, potentially leaving the threshold at `0`.
## Issue Context
To enforce a memory budget for hash join, the operator should carry a non-zero default/capped threshold when `join_spill_mem` is not explicitly set.
## Fix Focus Areas
- pkg/sql/compile/operator.go[946-952] მთ

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


9. Stdout debug prints🐞 Bug ✓ Correctness
Description
fmt.Println is used in HashBuild and HashJoin spill paths, which writes to stdout in production and
bypasses structured logging/verbosity controls.
Code

pkg/sql/colexec/hashbuild/build.go[R204-207]

+	// Flush remaining buffered data and close spill files
+	if spillMode {
+		fmt.Println("spilled build rows:", analyzer.GetOpStats().SpillRows)
+		for i, buf := range spillBuffers {
Evidence
These prints execute whenever spillMode triggers and can add latency and noise in server
environments. They also force fmt imports into critical operators.

pkg/sql/colexec/hashbuild/build.go[204-207]
pkg/sql/colexec/hashjoin/join.go[320-321]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Spill implementation uses `fmt.Println` in operator hot paths.
### Issue Context
Operators should use structured logging or analyzer stats, not stdout.
### Fix Focus Areas
- pkg/sql/colexec/hashbuild/build.go[204-207]
- pkg/sql/colexec/hashjoin/join.go[320-321]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


10. Bucket rebuild doubles memory 🐞 Bug ➹ Performance
Description
rebuildHashmapForBucket duplicates already-loaded build batches into builder.Batches via
CopyIntoBatches, potentially doubling memory usage per bucket and undermining the purpose of
spilling.
Code

pkg/sql/colexec/hashjoin/spill.go[R461-482]

+func (hashJoin *HashJoin) rebuildHashmapForBucket(proc *process.Process, buildBatches []*batch.Batch) (*message.JoinMap, error) {
+	// Create a temporary hashmap builder
+	builder := &hashbuild.HashmapBuilder{}
+	if err := builder.Prepare(hashJoin.EqConds[1], -1, proc); err != nil {
+		return nil, err
+	}
+
+	// Copy batches into builder
+	for _, bat := range buildBatches {
+		if err := builder.Batches.CopyIntoBatches(bat, proc); err != nil {
+			return nil, err
+		}
+		builder.InputBatchRowCount += bat.RowCount()
+	}
+
+	// Build hashmap
+	if err := builder.BuildHashmap(hashJoin.HashOnPK, true, false, proc); err != nil {
+		return nil, err
+	}
+
+	return message.NewJoinMap(builder.MultiSels, builder.IntHashMap, builder.StrHashMap, nil, builder.Batches.Buf, proc.Mp()), nil
+}
Evidence
The spilled build bucket is loaded from disk into memory (buildBatches). The rebuild path then
copies those batches again into builder.Batches before building the hashmap, causing an extra full
copy of the bucket data.

pkg/sql/colexec/hashjoin/spill.go[461-475]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Per-bucket hash rebuild copies build batches again, increasing memory pressure.
### Issue Context
Spilling is meant to reduce peak memory; doubling bucket memory can still OOM.
### Fix Focus Areas
- pkg/sql/colexec/hashjoin/spill.go[461-475]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind/feature size/XXL Denotes a PR that changes 2000+ lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants