Skip to content

Conversation

@ruochenj123
Copy link

@ruochenj123 ruochenj123 commented Jan 14, 2026

What problem does this PR solve?

Issue Number: #11

Type of Change

  • πŸ› Bug fix (non-breaking change which fixes an issue)
  • ✨ New feature (non-breaking change which adds functionality)
  • πŸš€ Performance improvement (optimization)
  • ⚠️ Breaking change (fix or feature that would cause existing functionality to change)
  • πŸ”¨ Refactoring (no logic changes)
  • πŸ”§ Build/CI or Infrastructure changes
  • πŸ“ Documentation only

Description

Currently HashJoin and Sort operations store data in row-based RowContainer, which incurs non-trivial layout conversion overhead. This PR introduces a hybrid storage design that keeps payload columns in their original columnar format while only storing keys in RowContainer, reducing this layout conversion overhead.

Main Changes

  1. HybridContainer

    • Separates key storage (in RowContainer) from payload storage (kept as RowVectorPtr).
    • Introduces encoded HybridRowId to reference payload rows.
    • HybridRowId encodes {containerId, rowId} to support multi-driver parallel execution in HashJoin.
  2. Multi-driver support in HashJoin

    • Each driver builds its own HybridContainer.
    • After table merge, the allContainers_ map enables cross-container payload extraction during the probe phase.
  3. Extraction optimizations

    • coalesceBatches() flattens multiple payload batches into a single contiguous batch to reduce TLB misses during extraction.
    • sortByContainerId() reorders rows by containerId before extraction to improve cache locality in multi-container scenarios.
    • Prefetching during extraction to hide memory latency and reduce data loading time.
    • isSingleContainer() provides a fast path that skips sorting overhead in the single-driver scenario.
  4. Configuration options

    • hybrid_join_enabled / hybrid_sort_enabled to opt in to hybrid execution.
    • hybrid_join_reorder_enabled to control row reordering (disabled in tests to preserve deterministic output).

Performance Impact

  • No Impact: This change does not affect the critical path (e.g., build system, doc, error handling).

  • Positive Impact: I have run benchmarks.

    Click to view Benchmark Results
    Paste your google-benchmark or TPC-H results here.
    Before: 10.5s
    After:   8.2s  (+20%)
    
  • Negative Impact: Explained below (e.g., trade-off for correctness).

Tested on bolt_tpch_benchmark with sf=10 iteration=5. The performances are almost the same.

+----------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+
| Type     | q1   | q2   | q3   | q4   | q5   | q6   | q7   | q8   | q9   | q10  | q11  | q12  | q13  | q14  | q15  | q16  | q17  | q18  | q19  | q20  | q21  | q22  |
+----------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+
| baseline | 856  | 287.4| 1096 | 2492 | 1374 | 453.6| 1090 | 1420 | 1990 | 1662 | 151  | 558.8| 1544 | 874  | 717.4| 353.6| 3816 | 2250 | 952  | 639.2| 2234 | 583.2|
| hybrid   | 882.8| 315  | 1100 | 2390 | 1370 | 450.4| 1078 | 1266 | 2072 | 1738 | 149.2| 631  | 1532 | 852.6| 698.6| 343.4| 4052 | 2062 | 918.4| 664.4| 2248 | 545.4|
+----------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+

Release Note

Please describe the changes in this PR

Release Note:

Release Note:
- Fixed a crash in `substr` when input is null.
- optimized `group by` performance by 20%.

Checklist (For Author)

  • I have added/updated unit tests (ctest).
  • I have verified the code with local build (Release/Debug).
  • I have run clang-format / linters.
  • (Optional) I have run Sanitizers (ASAN/TSAN) locally for complex C++ changes.
  • No need to test or manual test.

Breaking Changes

  • No

  • Yes (Description: ...)

    Click to view Breaking Changes
    Breaking Changes:
    - Description of the breaking change.
    - Possible solutions or workarounds.
    - Any other relevant information.
    

@CLAassistant
Copy link

CLAassistant commented Jan 14, 2026

CLA assistant check
All committers have signed the CLA.

@ruochenj123 ruochenj123 changed the title Hybrid layout design for HashJoin/Sort [WIP] Hybrid layout design for HashJoin/Sort Jan 14, 2026
@markjin1990 markjin1990 added the performance performance improvement needed label Jan 14, 2026
@markjin1990 markjin1990 requested a review from fzhedu January 14, 2026 21:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance performance improvement needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants