Refactor hash join dynamic filtering for progressive bounds application #17632

adriangb · 2025-09-17T19:24:43Z

Summary

This PR refactors the DataFusion hash join dynamic filtering implementation to support progressive bounds application instead of waiting for all build-side partitions to complete. This optimization reduces probe-side scan overhead and improves join performance.

Background

Previously, the dynamic filtering system used a barrier-based approach that waited for ALL build-side partitions to complete before applying any filters. This caused unnecessary delays in probe-side filtering.

Key Innovation

The refactored approach uses hash-based filter expressions that ensure correctness without coordination:

-- Progressive phase filter for each completed partition N:
(hash(col1, col2, ...) % num_partitions != N OR col1 >= min_N AND col1 <= max_N)

-- All partition filters combined with AND
(hash_filter_0) AND (hash_filter_1) AND ...

-- Final optimization when all partitions complete:
(bounds_0) OR (bounds_1) OR (bounds_2) OR ...

Why This Works

For data belonging to completed partition N: The bounds check (col >= min_N AND col <= max_N) correctly filters
For data belonging to incomplete partitions: The hash check (hash(cols) % num_partitions != N) lets all potential matches pass through
No false negatives: We never incorrectly filter out valid join candidates
Progressive improvement: Filter selectivity increases with each completed partition
Final optimization: Hash computations are removed when all partitions complete

Changes Made

Core Components

SharedBoundsAccumulator (shared_bounds.rs):

✅ Removed tokio::sync::Barrier coordination
✅ Added progressive filter injection with hash expressions
✅ Implemented partition completion tracking
✅ Added final optimization to remove hash checks

HashJoinStream (stream.rs):

✅ Removed WaitPartitionBoundsReport state
✅ Made bounds reporting synchronous (no more async coordination)
✅ Simplified state machine

Hash Function (hash.rs):

✅ Minor formatting improvements from linter

Filter Expression Evolution

Phase	Expression	Purpose
Progressive	`(hash(cols) % n != partition_id OR bounds_partition)`	Immediate filtering as partitions complete
Final	`(bounds_0 OR bounds_1 OR ...)`	Optimized bounds-only filter

Performance Benefits

🚀 Immediate probe-side filtering - Starts as soon as first partition completes
📈 Progressive improvement - Filter selectivity increases incrementally
🔄 No coordination overhead - Eliminates barrier synchronization
⚡ Final optimization - Removes hash computation when all partitions done
✅ Correctness maintained - Never filters out valid join candidates

Testing

✅ All 164 existing hash join tests pass
✅ Compilation successful across all components
✅ Maintains backwards compatibility

Test plan

Verify all existing hash join tests pass
Ensure compilation succeeds
Validate no regressions in join correctness
Performance benchmarking (suggested follow-up)

🤖 Generated with Claude Code

adriangb · 2025-09-17T19:40:59Z

cc @rkrishn7

Copilot

Pull Request Overview

This PR refactors DataFusion's hash join dynamic filtering to use progressive bounds application instead of waiting for all build-side partitions to complete, improving join performance by reducing probe-side scan overhead.

Key Changes:

Eliminates barrier-based synchronization in favor of immediate progressive filtering
Implements hash-based filter expressions to ensure correctness without coordination
Optimizes final filter to remove hash computations when all partitions complete

Reviewed Changes

Copilot reviewed 6 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`stream.rs`	Removes barrier synchronization, simplifies state machine by eliminating `WaitPartitionBoundsReport` state
`shared_bounds.rs`	Implements progressive filter logic with hash-based expressions and partition completion tracking
`physical-plan/Cargo.toml`	Adds dependency on `datafusion-functions` for hash function access
`functions/lib.rs`	Exposes new hash module for internal use
`hash.rs`	New hash function implementation using DataFusion's internal hash algorithm
`functions/Cargo.toml`	Adds `ahash` dependency for consistent hashing

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-09-17T19:42:13Z

datafusion/physical-plan/src/joins/hash_join/shared_bounds.rs

+        Ok(Arc::new(ScalarFunctionExpr::new(
+            "hash",
+            hash_udf,
+            self.on_right.clone(),
+            Field::new("hash_result", DataType::UInt64, false).into(),
+            Arc::new(ConfigOptions::default()),
+        )))


Creating a new ConfigOptions::default() wrapped in Arc for every hash expression is inefficient. Consider caching this value as a field in SharedBoundsAccumulator to avoid repeated allocations.

datafusion/physical-plan/src/joins/hash_join/shared_bounds.rs

Copilot · 2025-09-17T19:42:14Z

datafusion/functions/src/hash.rs

+    pub fn new() -> Self {
+        // Use the same seed as hash joins for consistency
+        let random_state =
+            RandomState::with_seeds('H' as u64, 'A' as u64, 'S' as u64, 'H' as u64);


The hardcoded seed values ('H', 'A', 'S', 'H') should be defined as constants to ensure consistency across the codebase and make them easier to maintain. Consider defining these as module-level constants.

adriangb · 2025-09-17T19:42:31Z

datafusion/functions/src/hash.rs

+#[user_doc(
+    doc_section(label = "Hashing Functions"),
+    description = "Computes hash values for one or more arrays using DataFusion's internal hash algorithm. When multiple arrays are provided, their hash values are combined using the same logic as multi-column joins.",


My feeling is that long term we should just switch from using ahash to murmurhash or something so that it is (1) not machine specific (works for distributed use cases, pre-computed hash columns, etc.) and (2) we can specify the algorithm used instead of a generic "internal hash"

Yes I think switching to something like xxhash64 would be ideal. Suits our needs and is stable/reusable for distributed cases + i'm pretty sure its faster than murmur (not sure about ahash). I'm aware that https://github.com/DoumanAsh/xxhash-rust is used by Polars so looking into this repo could be a good place to start.

I think @Dandandan did some initial work on hashing a few years ago, so may remember why we went with ahash vs others

Ahash probably is faster (didn't benchmark it TBH against all alternatives, but it had a lot of benchmarks showing it is faster than xxhash / murmurhash, and gave good speedup compared to SipHash which was used before it via HashMap).

Maybe https://github.com/orlp/foldhash/ is a good candidate for an alternative speed-wise (is now used by hashbrown instead of aHash as default).

There is something to say for keeping it documented as "internal hash" as well, as it will allow to change it without breaking the API in the future.

I also found https://github.com/hoxxep/rapidhash in the meantime.

rkrishn7

Really great idea!

rkrishn7 · 2025-09-17T19:45:57Z

datafusion/physical-plan/src/joins/hash_join/shared_bounds.rs

+        Ok(Arc::new(ScalarFunctionExpr::new(
+            "hash",
+            hash_udf,
+            self.on_right.clone(),


Don't we want to specify the RandomState here?

The notion of the "correct" partition (and thus if the bounds are relevant) occurs downstream of re-partitioning, since the build side builds hash tables and independent bounds according to these partitions.

So I would think we would want to specify the same random state as the repartition operator to ensure that the hash(...) % n != partition_id portion returns the right result, right? Otherwise we may potentially evaluate incorrect bounds?

Yes! I'll fix it and think about how we can add a test

This commit addresses a critical correctness issue identified in PR review where the hash function used in filter expressions had different RandomState seeds than the RepartitionExec operator, potentially causing incorrect partition assignments and filtering. ## Key Fixes **RandomState Consistency**: - Use RepartitionExec's RandomState (0,0,0,0) instead of custom seeds ('H','A','S','H') - Ensures `hash(row) % num_partitions` produces same partition assignments as actual partitioning - Prevents false negatives in progressive filtering logic **Performance Optimization**: - Cache ConfigOptions in SharedBoundsAccumulator to avoid repeated allocations - Single Arc<ConfigOptions> shared across all hash expression creations **Code Quality**: - Improved comment clarity replacing outdated "HEAD" references - Better documentation of deduplication logic ## Why This Matters Without matching RandomState seeds, our progressive filter expressions could: - Incorrectly filter out valid join candidates (correctness issue) - Allow through data that doesn't belong to a partition (performance issue) - Break the fundamental assumption that hash-based filtering matches partitioning Resolves: apache#17632 (comment) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

adriangb · 2025-09-17T20:31:22Z

cc @gabotechs

rkrishn7 · 2025-09-18T00:45:59Z

I think one thing to note with this approach is that we are running the hash function up to N times until all partitions are complete. Which I imagine would have a performance impact.

I wonder if we should wait until we can memoize non-volatile function calls with the same arguments within a filter? (which I think is tracked by #17599).

adriangb · 2025-09-18T01:10:58Z

It shouldn't be too bad: without filters you'd still have to run the hash function once in RepartitionExec and another hash function in HashJoinExec. So we're running 2 times instead of 3 for rows that match the filter. For rows that are pruned we run it 1 time instead of 2. And that's only until all of the build sides are done, then we may run it 0 times.

adriangb · 2025-09-18T01:11:17Z

But we should run some benchmarks.

rkrishn7 · 2025-09-18T01:20:22Z

It shouldn't be too bad: without filters you'd still have to run the hash function once in RepartitionExec and another hash function in HashJoinExec. So we're running 2 times instead of 3 for rows that match the filter. For rows that are pruned we run it 1 time instead of 2. And that's only until all of the build sides are done, then we may run it 0 times.

The hash(...) % n != partition_id portion of the filter gets added for each build partition, right? If that's the case then in the worst case we're running it up to N times just for the dynamic filter?

adriangb · 2025-09-18T03:18:16Z

It shouldn't be too bad: without filters you'd still have to run the hash function once in RepartitionExec and another hash function in HashJoinExec. So we're running 2 times instead of 3 for rows that match the filter. For rows that are pruned we run it 1 time instead of 2. And that's only until all of the build sides are done, then we may run it 0 times.

The hash(...) % n != partition_id portion of the filter gets added for each build partition, right? If that's the case then in the worst case we're running it up to N times just for the dynamic filter?

Is N the number of partitions? Yeah that's a good point. That could be a problem. It seems obvious that there should be a way to avoid re-evaluating an expression within a single evaluate call by building an evaluation DAG instead of a tree but you're right that doesn't currently exist.

jonathanc-n · 2025-09-18T03:35:49Z

datafusion/physical-plan/src/joins/hash_join/shared_bounds.rs

+        }
+
+        // Combine all column predicates for this partition with AND
+        if column_predicates.is_empty() {


If we return lit(true) here it will nullify the filter once everything is ored together in the final phase. Maybe we can pass back an Option and skip the partition filter that returns None

jonathanc-n · 2025-09-18T03:52:07Z

I think we should add a doc explaining how each function leads into the next. Makes it easier for reviewers or people working on this code to make changes.

adriangb · 2025-09-18T05:37:01Z

It shouldn't be too bad: without filters you'd still have to run the hash function once in RepartitionExec and another hash function in HashJoinExec. So we're running 2 times instead of 3 for rows that match the filter. For rows that are pruned we run it 1 time instead of 2. And that's only until all of the build sides are done, then we may run it 0 times.

The hash(...) % n != partition_id portion of the filter gets added for each build partition, right? If that's the case then in the worst case we're running it up to N times just for the dynamic filter?

Is N the number of partitions? Yeah that's a good point. That could be a problem. It seems obvious that there should be a way to avoid re-evaluating an expression within a single evaluate call by building an evaluation DAG instead of a tree but you're right that doesn't currently exist.

Simple solution: use a CASE (hash(col) % n_part) WHEN 0 ... WHEN 1 ... ELSE true. Then the hash will only be evaluated once. I'll push this tomorrow.

Omega359 · 2025-09-18T13:11:38Z

This article may be helpful in selecting the hash - https://medium.com/@tprodanov/benchmarking-non-cryptographic-hash-functions-in-rust-2e6091077d11 and this doc https://docs.google.com/spreadsheets/d/1HmqDj-suH4wBFNg7etwE8WVBlfCufvD5-gAnIENs94k/edit?gid=1915335726#gid=1915335726

alamb · 2025-09-18T14:47:19Z

I will try and review this PR tomorrow

adriangb · 2025-09-18T14:48:31Z

I'm going to rename the Hash UDF to RepartitionHash and (1) make it clear that this is an internal hash function subject to change (as suggested by @Dandandan) and (2) leave it as AHash for now to punt on selecting a hash function compatible with distributed hash calculations.

rkrishn7 · 2025-09-18T16:12:37Z

datafusion/physical-plan/src/joins/hash_join/shared_bounds.rs

+/// RandomState used by RepartitionExec for consistent hash partitioning
+/// This must match the seeds used in RepartitionExec to ensure our hash-based
+/// filter expressions compute the same partition assignments as the actual partitioning
+const REPARTITION_RANDOM_STATE: RandomState = RandomState::with_seeds(0, 0, 0, 0);


Can we make this available in the repartition physical plan module and re-use it from there?

I've done this. In part because it was also necessary to test this change to be able to have deterministic partitioning (i.e. given the input values I know some will end up in each partition). That change will also allow an optimizer rule to swap out hash functions without us having to quibble about which one to choose.

rkrishn7 · 2025-09-18T16:14:35Z

datafusion/physical-plan/src/joins/hash_join/shared_bounds.rs

+    }
+
+    /// Create final optimized filter from all partition bounds using OR logic
+    pub(crate) fn create_optimized_filter_from_partition_bounds(


I can see it being worthwhile to skip this step. For example, it ties in nicely with #17529 where we can just push the hash comparison physical expr as a partition dependent predicate.

IMO it's always worth having stats based predicates e.g. because they can be used for stats pruning of files / row groups / pages whereas hash tables / bloom filters cannot (not sure about IN LIST).

Agreed I just mean the part where we remove the hash check

I see your point is that #17632 will need the hash check anyway? I guess what we'd probably end up doing is something like ((col >= 1 and col <= 2) or (col >= 2 and col <= 5)) and (case hash(col) % n == 0 when 0 then <check hash map 0> when 1 <check hash map 1> else true end) that is keep the hash check for the hash table lookups but remove it for the min max stats so that those become "simple" expressions that can be used by PruningPredicate and such.

rkrishn7 · 2025-09-18T16:16:21Z

I'm going to rename the Hash UDF to RepartitionHash and (1) make it clear that this is an internal hash function subject to change (as suggested by @Dandandan) and (2) leave it as AHash for now to punt on selecting a hash function compatible with distributed hash calculations.

If it's solely for internal use at the moment I wonder if it's better to just make it a private PhysicalExpr and keep it within the hash join module

adriangb · 2025-09-18T20:35:45Z

I'm going to rename the Hash UDF to RepartitionHash and (1) make it clear that this is an internal hash function subject to change (as suggested by @Dandandan) and (2) leave it as AHash for now to punt on selecting a hash function compatible with distributed hash calculations.

If it's solely for internal use at the moment I wonder if it's better to just make it a private PhysicalExpr and keep it within the hash join module

I made it a pub(crate) in the repartition module

This commit refactors the hash join dynamic filtering implementation to apply filters progressively as each build-side partition completes, rather than waiting for all partitions to finish. This reduces probe-side scan overhead and improves performance. - **Immediate filtering**: Filters are applied as soon as each partition completes - **Hash-based expressions**: Uses `(hash(cols) % num_partitions != partition_id OR bounds)` to ensure correctness without waiting for all partitions - **Optimization**: Removes hash checks when all partitions complete - **SharedBoundsAccumulator**: Removed barrier-based coordination, added progressive filter injection - **HashJoinStream**: Removed WaitPartitionBoundsReport state, made bounds reporting synchronous - **Filter expressions**: Progressive phase uses AND-combined hash filters, final phase uses OR-combined bounds The hash-based filter expression ensures no false negatives: - Data for completed partitions passes through via exact bounds checks - Data for incomplete partitions passes through via hash modulo checks - Progressive improvement in filter selectivity as more partitions complete Maintains correctness while enabling immediate probe-side filtering. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

This commit addresses a critical correctness issue identified in PR review where the hash function used in filter expressions had different RandomState seeds than the RepartitionExec operator, potentially causing incorrect partition assignments and filtering. ## Key Fixes **RandomState Consistency**: - Use RepartitionExec's RandomState (0,0,0,0) instead of custom seeds ('H','A','S','H') - Ensures `hash(row) % num_partitions` produces same partition assignments as actual partitioning - Prevents false negatives in progressive filtering logic **Performance Optimization**: - Cache ConfigOptions in SharedBoundsAccumulator to avoid repeated allocations - Single Arc<ConfigOptions> shared across all hash expression creations **Code Quality**: - Improved comment clarity replacing outdated "HEAD" references - Better documentation of deduplication logic ## Why This Matters Without matching RandomState seeds, our progressive filter expressions could: - Incorrectly filter out valid join candidates (correctness issue) - Allow through data that doesn't belong to a partition (performance issue) - Break the fundamental assumption that hash-based filtering matches partitioning Resolves: apache#17632 (comment) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

adriangb · 2025-09-18T21:22:37Z

@alamb when you review, would you mind kicking off some join benchmarks? thank you

adriangb · 2025-09-18T21:37:00Z

datafusion/physical-plan/src/repartition/hash.rs

@@ -0,0 +1,154 @@
+// Licensed to the Apache Software Foundation (ASF) under one


These changes (a good chunk of this PR) are split out into #17648

UBarney · 2025-09-20T14:53:38Z

datafusion/physical-plan/src/joins/hash_join/shared_bounds.rs

+        drop(inner);
+
+        // Update the dynamic filter
+        if let Some(filter_expr) = filter_expr {
+            self.dynamic_filter.update(filter_expr)?;


Will try to look and address next week

adriangb · 2025-09-20T15:32:51Z

Hi folks, thanks for looking at this. I need to do some work on this to make it a stacked PR on top of #17648. I also already have some feedback to address. I will be at a team offsite next week, but I hope to do some work on it Sunday at the airport.

alamb · 2025-09-22T13:47:37Z

🤖 ./gh_compare_branch.sh Benchmark Script Running
Linux aal-dev 6.14.0-1014-gcp #15~24.04.1-Ubuntu SMP Fri Jul 25 23:26:08 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing hashes-hash-join (1faa09f) to 293bf3e diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

alamb · 2025-09-22T14:51:14Z

🤖: Benchmark completed

Details

Comparing HEAD and hashes-hash-join
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ hashes-hash-join ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │  2679.27 ms │       2688.15 ms │     no change │
│ QQuery 1     │  1272.84 ms │       1200.70 ms │ +1.06x faster │
│ QQuery 2     │  2366.28 ms │       2287.50 ms │     no change │
│ QQuery 3     │  1224.44 ms │       1224.86 ms │     no change │
│ QQuery 4     │  2224.15 ms │       2251.25 ms │     no change │
│ QQuery 5     │ 28250.59 ms │      26996.51 ms │     no change │
│ QQuery 6     │  4316.11 ms │       4145.65 ms │     no change │
│ QQuery 7     │  3605.35 ms │       3628.94 ms │     no change │
└──────────────┴─────────────┴──────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary               ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)               │ 45939.02ms │
│ Total Time (hashes-hash-join)   │ 44423.55ms │
│ Average Time (HEAD)             │  5742.38ms │
│ Average Time (hashes-hash-join) │  5552.94ms │
│ Queries Faster                  │          1 │
│ Queries Slower                  │          0 │
│ Queries with No Change          │          7 │
│ Queries with Failure            │          0 │
└─────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃        HEAD ┃ hashes-hash-join ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0     │     2.19 ms │          2.22 ms │     no change │
│ QQuery 1     │    50.29 ms │         50.89 ms │     no change │
│ QQuery 2     │   135.58 ms │        144.44 ms │  1.07x slower │
│ QQuery 3     │   162.58 ms │        158.82 ms │     no change │
│ QQuery 4     │  1109.09 ms │       1113.53 ms │     no change │
│ QQuery 5     │  1526.86 ms │       1533.56 ms │     no change │
│ QQuery 6     │     2.20 ms │          2.27 ms │     no change │
│ QQuery 7     │    54.38 ms │         56.35 ms │     no change │
│ QQuery 8     │  1506.72 ms │       1558.81 ms │     no change │
│ QQuery 9     │  1886.28 ms │       1819.64 ms │     no change │
│ QQuery 10    │   430.06 ms │        386.75 ms │ +1.11x faster │
│ QQuery 11    │   481.05 ms │        433.46 ms │ +1.11x faster │
│ QQuery 12    │  1405.80 ms │       1341.36 ms │     no change │
│ QQuery 13    │  2186.37 ms │       2135.53 ms │     no change │
│ QQuery 14    │  1294.71 ms │       1274.81 ms │     no change │
│ QQuery 15    │  1246.47 ms │       1247.90 ms │     no change │
│ QQuery 16    │  2691.51 ms │       2696.11 ms │     no change │
│ QQuery 17    │  2686.63 ms │       2694.90 ms │     no change │
│ QQuery 18    │  6238.93 ms │       5183.01 ms │ +1.20x faster │
│ QQuery 19    │   129.53 ms │        125.19 ms │     no change │
│ QQuery 20    │  2106.37 ms │       1965.62 ms │ +1.07x faster │
│ QQuery 21    │  2426.47 ms │       2305.26 ms │     no change │
│ QQuery 22    │  6881.22 ms │       3933.89 ms │ +1.75x faster │
│ QQuery 23    │ 26349.15 ms │      27312.42 ms │     no change │
│ QQuery 24    │   226.70 ms │        218.69 ms │     no change │
│ QQuery 25    │   516.79 ms │        513.76 ms │     no change │
│ QQuery 26    │   221.50 ms │        235.83 ms │  1.06x slower │
│ QQuery 27    │  2913.31 ms │       2870.82 ms │     no change │
│ QQuery 28    │ 23512.96 ms │      23221.06 ms │     no change │
│ QQuery 29    │   997.36 ms │        991.14 ms │     no change │
│ QQuery 30    │  1354.92 ms │       1324.77 ms │     no change │
│ QQuery 31    │  1382.36 ms │       1348.57 ms │     no change │
│ QQuery 32    │  5997.21 ms │       4626.59 ms │ +1.30x faster │
│ QQuery 33    │  6533.10 ms │       5774.89 ms │ +1.13x faster │
│ QQuery 34    │  6809.57 ms │       6511.11 ms │     no change │
│ QQuery 35    │  2164.62 ms │       2028.76 ms │ +1.07x faster │
│ QQuery 36    │   127.17 ms │        121.05 ms │     no change │
│ QQuery 37    │    55.09 ms │         54.02 ms │     no change │
│ QQuery 38    │   122.56 ms │        121.87 ms │     no change │
│ QQuery 39    │   202.25 ms │        199.52 ms │     no change │
│ QQuery 40    │    46.30 ms │         44.48 ms │     no change │
│ QQuery 41    │    41.54 ms │         40.45 ms │     no change │
│ QQuery 42    │    33.48 ms │         34.06 ms │     no change │
└──────────────┴─────────────┴──────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┓
┃ Benchmark Summary               ┃             ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━┩
│ Total Time (HEAD)               │ 116249.24ms │
│ Total Time (hashes-hash-join)   │ 109758.19ms │
│ Average Time (HEAD)             │   2703.47ms │
│ Average Time (hashes-hash-join) │   2552.52ms │
│ Queries Faster                  │           8 │
│ Queries Slower                  │           2 │
│ Queries with No Change          │          33 │
│ Queries with Failure            │           0 │
└─────────────────────────────────┴─────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃      HEAD ┃ hashes-hash-join ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │ 199.04 ms │        178.31 ms │ +1.12x faster │
│ QQuery 2     │  31.72 ms │         29.33 ms │ +1.08x faster │
│ QQuery 3     │  56.21 ms │         48.57 ms │ +1.16x faster │
│ QQuery 4     │  34.76 ms │         30.03 ms │ +1.16x faster │
│ QQuery 5     │  94.38 ms │         81.70 ms │ +1.16x faster │
│ QQuery 6     │  26.61 ms │         19.44 ms │ +1.37x faster │
│ QQuery 7     │ 151.63 ms │        157.40 ms │     no change │
│ QQuery 8     │  31.04 ms │         37.72 ms │  1.22x slower │
│ QQuery 9     │  86.31 ms │         89.05 ms │     no change │
│ QQuery 10    │  60.50 ms │         63.46 ms │     no change │
│ QQuery 11    │  41.48 ms │         41.53 ms │     no change │
│ QQuery 12    │  51.71 ms │         53.12 ms │     no change │
│ QQuery 13    │  47.83 ms │         48.24 ms │     no change │
│ QQuery 14    │  14.34 ms │         15.08 ms │  1.05x slower │
│ QQuery 15    │  24.95 ms │         26.03 ms │     no change │
│ QQuery 16    │  24.33 ms │         26.16 ms │  1.08x slower │
│ QQuery 17    │ 152.85 ms │        150.09 ms │     no change │
│ QQuery 18    │ 333.38 ms │        340.89 ms │     no change │
│ QQuery 19    │  36.76 ms │         37.23 ms │     no change │
│ QQuery 20    │  49.40 ms │         52.31 ms │  1.06x slower │
│ QQuery 21    │ 226.95 ms │        227.38 ms │     no change │
│ QQuery 22    │  19.19 ms │         21.23 ms │  1.11x slower │
└──────────────┴───────────┴──────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary               ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)               │ 1795.38ms │
│ Total Time (hashes-hash-join)   │ 1774.30ms │
│ Average Time (HEAD)             │   81.61ms │
│ Average Time (hashes-hash-join) │   80.65ms │
│ Queries Faster                  │         6 │
│ Queries Slower                  │         5 │
│ Queries with No Change          │        11 │
│ Queries with Failure            │         0 │
└─────────────────────────────────┴───────────┘

Omega359 · 2025-09-22T14:59:40Z

QQuery 22    │  6881.22 ms │       3933.89 ms │ +1.75x faster

👀

adriangb · 2025-09-26T06:42:54Z

QQuery 22    │  6881.22 ms │       3933.89 ms │ +1.75x faster

👀

Several other improvements in the tpch benches it looks like. Promising if true!

I've been flat out at a company offsite this past week and will probably need some time next week to get back into the swing of things. But I promise I'll come back here, revisit, break up into smaller PRs, etc. It will be nice in the end I think.

Dandandan · 2025-09-26T07:26:43Z

Didn't look at the PR details yet, but why did aggregation benchmarks improve as well (clickbench)?

github-actions bot added functions Changes to functions implementation physical-plan Changes to the physical-plan crate labels Sep 17, 2025

adriangb force-pushed the hashes-hash-join branch from a7da859 to fadb4f6 Compare September 17, 2025 19:40

adriangb requested a review from Copilot September 17, 2025 19:41

Copilot AI reviewed Sep 17, 2025

View reviewed changes

adriangb commented Sep 17, 2025

View reviewed changes

rkrishn7 reviewed Sep 17, 2025

View reviewed changes

adriangb force-pushed the hashes-hash-join branch from 8e3023f to b63bb3e Compare September 17, 2025 19:57

jonathanc-n reviewed Sep 18, 2025

View reviewed changes

adriangb mentioned this pull request Sep 18, 2025

Add xxhash algorithms in SQL and expression api #14367

Closed

rkrishn7 reviewed Sep 18, 2025

View reviewed changes

adriangb mentioned this pull request Sep 18, 2025

make hash function in RepartitionExec configurable #17648

Open

github-actions bot added the core Core DataFusion crate label Sep 18, 2025

adriangb and others added 2 commits September 18, 2025 16:53

add hash(array1, ...arrayn) function

f890709

adriangb and others added 11 commits September 18, 2025 16:53

fix clippy

0626425

fmt

7797b5c

wip

6685ddf

wip

5bd584c

wip

53f2976

wip

474fc98

wip

a5123d1

fix

e1249b5

fmt

71aa629

fix test, revert datafusion-testing

db2ac80

adriangb force-pushed the hashes-hash-join branch from fb6a78e to db2ac80 Compare September 18, 2025 20:57

adriangb commented Sep 18, 2025

View reviewed changes

add more comments

1faa09f

xudong963 self-requested a review September 19, 2025 08:37

UBarney reviewed Sep 20, 2025

View reviewed changes

		@@ -0,0 +1,154 @@
		// Licensed to the Apache Software Foundation (ASF) under one

Refactor hash join dynamic filtering for progressive bounds application #17632

Are you sure you want to change the base?

Refactor hash join dynamic filtering for progressive bounds application #17632

Uh oh!

Conversation

adriangb commented Sep 17, 2025

Summary

Background

Key Innovation

Why This Works

Changes Made

Core Components

Filter Expression Evolution

Performance Benefits

Testing

Test plan

Uh oh!

adriangb commented Sep 17, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Sep 17, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Dandandan Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rkrishn7 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adriangb commented Sep 17, 2025

Uh oh!

rkrishn7 commented Sep 18, 2025

Uh oh!

adriangb commented Sep 18, 2025

Uh oh!

adriangb commented Sep 18, 2025

Uh oh!

rkrishn7 commented Sep 18, 2025

Uh oh!

adriangb commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jonathanc-n commented Sep 18, 2025

Uh oh!

adriangb commented Sep 18, 2025

Uh oh!

Omega359 commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alamb commented Sep 18, 2025

Uh oh!

adriangb commented Sep 18, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Dandandan Sep 18, 2025 •

edited

Loading

adriangb commented Sep 18, 2025 •

edited

Loading

Omega359 commented Sep 18, 2025 •

edited

Loading

Dandandan commented Sep 26, 2025 •

edited

Loading