Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Prefetching to Hash Join #35

Merged
merged 4 commits into from
Oct 29, 2023
Merged

Add Prefetching to Hash Join #35

merged 4 commits into from
Oct 29, 2023

Conversation

wagjamin
Copy link
Owner

@wagjamin wagjamin commented Oct 22, 2023

This commit adds prefetching to our hash joins. Rather than doing a single lookup call, we now split the logic into three calls:

  1. A hash call -> Performs the key hash on the hash table
  2. A prefetch call -> Prefetches the respective slots from the hash table
  3. The actual lookup call -> Does the key lookup, on the slot that was previously prefetched

This will be interesting for our vectorized backend, as we can issue many independent memory loads in short succession.

We also allow disabling the prefetch calls for JIT compiled code. They are rather pointless as the lookup right after will load the data into cache anyways.

This commit adds prefetching to our hash joins. Rather than doing a
single lookup call, we now split the logic into three calls:

1. A hash call -> Performs the key hash on the hash table
2. A prefetch call -> Prefetches the respective slots from the hash
   table
2. The actual lookup call -> Does the key lookup, on the slot that was
   previously prefetched

This will be interesting for our vectorized backend, as we can issue
many independent memory loads in short succession.

In the next commits we will:

1. Allow disabling the prefetch calls for JIT compiled code. They are
   rather pointless as the lookup right after will load the data into
   cache anyways.
2. Perform dynamic chunking in the vectorized code. At the end of the
   pipeline when we move into hash table operations, we will dynamically
   reduce the chunk size to ~256 in order to make sure the prefetching
   only fills the L1/L2 caches. If the prefetching range becomes too
   large, then we start evicting parts of the hash table from cache
   again.

After this, we should have a significantly faster vectorized execution
backend.
This commit is the next one in the chain to generate a faster vectorized
backend. We can now make both a `Suboperator` and the
`CompilationContext` with additional optimziation hints.

This allows us to mark suboperator that generate prefetching code in a
way that does not generate code for operator-fusing codegen.
The prefetching calls are now only emitted for functions in the
vectorized backend, but do not generate code for compiled execution.

In general, the prefetching for operator fusing code is not important,
as we will do a lookup on the same tuple right after, which will then
cause the respective cache miss. As a result, prefetching only generates
more instructions and  function calls.
When building a hash table during runtime we can apply the same tricks
we know from how to make vectorized hash tables fast.

We split the building into batches of 256 tuples. This allows for higher
insert throughput on large hash tables.
The CI suddenly started breaking. Make it more robust by:
- Fixing Ubuntu 22.04 (this alone is not enough)
- Fixing libc++ as the C++ standard library
- Work around llvm/llvm-project#59432

The second seemed to be the actual failure. It seems like we were
calling into the include headers of libstdc++ from a system GCC
installation and that was causing build issues.

This then caused ASAN failures coming from Ubuntu packaging issues which
are fixed by running the tests with disabled ASAN alloc/dealloc mismatch
warnings.
@wagjamin wagjamin merged commit 92549bc into main Oct 29, 2023
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant