Add Prefetching to Hash Join #35

wagjamin · 2023-10-22T12:16:32Z

This commit adds prefetching to our hash joins. Rather than doing a single lookup call, we now split the logic into three calls:

A hash call -> Performs the key hash on the hash table
A prefetch call -> Prefetches the respective slots from the hash table
The actual lookup call -> Does the key lookup, on the slot that was previously prefetched

This will be interesting for our vectorized backend, as we can issue many independent memory loads in short succession.

We also allow disabling the prefetch calls for JIT compiled code. They are rather pointless as the lookup right after will load the data into cache anyways.

This commit adds prefetching to our hash joins. Rather than doing a single lookup call, we now split the logic into three calls: 1. A hash call -> Performs the key hash on the hash table 2. A prefetch call -> Prefetches the respective slots from the hash table 2. The actual lookup call -> Does the key lookup, on the slot that was previously prefetched This will be interesting for our vectorized backend, as we can issue many independent memory loads in short succession. In the next commits we will: 1. Allow disabling the prefetch calls for JIT compiled code. They are rather pointless as the lookup right after will load the data into cache anyways. 2. Perform dynamic chunking in the vectorized code. At the end of the pipeline when we move into hash table operations, we will dynamically reduce the chunk size to ~256 in order to make sure the prefetching only fills the L1/L2 caches. If the prefetching range becomes too large, then we start evicting parts of the hash table from cache again. After this, we should have a significantly faster vectorized execution backend.

This commit is the next one in the chain to generate a faster vectorized backend. We can now make both a `Suboperator` and the `CompilationContext` with additional optimziation hints. This allows us to mark suboperator that generate prefetching code in a way that does not generate code for operator-fusing codegen. The prefetching calls are now only emitted for functions in the vectorized backend, but do not generate code for compiled execution. In general, the prefetching for operator fusing code is not important, as we will do a lookup on the same tuple right after, which will then cause the respective cache miss. As a result, prefetching only generates more instructions and function calls.

When building a hash table during runtime we can apply the same tricks we know from how to make vectorized hash tables fast. We split the building into batches of 256 tuples. This allows for higher insert throughput on large hash tables.

The CI suddenly started breaking. Make it more robust by: - Fixing Ubuntu 22.04 (this alone is not enough) - Fixing libc++ as the C++ standard library - Work around llvm/llvm-project#59432 The second seemed to be the actual failure. It seems like we were calling into the include headers of libstdc++ from a system GCC installation and that was causing build issues. This then caused ASAN failures coming from Ubuntu packaging issues which are fixed by running the tests with disabled ASAN alloc/dealloc mismatch warnings.

wagjamin added 3 commits October 22, 2023 14:16

Vectorize Join Hash Table Build

98cac9e

When building a hash table during runtime we can apply the same tricks we know from how to make vectorized hash tables fast. We split the building into batches of 256 tuples. This allows for higher insert throughput on large hash tables.

wagjamin force-pushed the wagjamin-hash-join-prefetch branch from aab803b to 21bfdb7 Compare October 29, 2023 12:24

wagjamin force-pushed the wagjamin-hash-join-prefetch branch from 21bfdb7 to 08b3b67 Compare October 29, 2023 12:38

wagjamin merged commit 92549bc into main Oct 29, 2023
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Prefetching to Hash Join #35

Add Prefetching to Hash Join #35

wagjamin commented Oct 22, 2023 •

edited

Loading

Add Prefetching to Hash Join #35

Add Prefetching to Hash Join #35

Conversation

wagjamin commented Oct 22, 2023 • edited Loading

wagjamin commented Oct 22, 2023 •

edited

Loading