Skip to content

Conversation

@KAlbert2333
Copy link

What problem does this PR solve?

Issue Number: close #54

Type of Change

  • 🐛 Bug fix (non-breaking change which fixes an issue)
  • ✨ New feature (non-breaking change which adds functionality)
  • 🚀 Performance improvement (optimization)
  • ⚠️ Breaking change (fix or feature that would cause existing functionality to change)
  • 🔨 Refactoring (no logic changes)
  • 🔧 Build/CI or Infrastructure changes
  • 📝 Documentation only

Description

Constructed the xsimd extension of SVE and implemented SVE optimization for some Simd functions.

Performance Impact

  • No Impact: This change does not affect the critical path (e.g., build system, doc, error handling).

  • Positive Impact: I have run benchmarks.

    Click to view Benchmark Results
    TPCDS99 1T results.
    Before: 1565s
    After: 1530s  (+2%)
    
  • Negative Impact: Explained below (e.g., trade-off for correctness).

Release Note

Please describe the changes in this PR

Release Note:

Release Note:
- Constructed an xsimd extension for SVE and provided vectorized implementations of functions such as BitMask, Gather, MaskGather, Pack32, Permute, Filter, etc

Checklist (For Author)

  • I have added/updated unit tests (ctest).
  • I have verified the code with local build (Release/Debug).
  • I have run clang-format / linters.
  • (Optional) I have run Sanitizers (ASAN/TSAN) locally for complex C++ changes.
  • No need to test or manual test.

Breaking Changes

  • No

  • Yes (Description: ...)

    Click to view Breaking Changes
    Breaking Changes:
    - Description of the breaking change.
    - Possible solutions or workarounds.
    - Any other relevant information.
    

Added SVE support for various SIMD operations and updated function signatures to use int64_t instead of int32_t for better compatibility with larger data sizes.
@CLAassistant
Copy link

CLAassistant commented Dec 25, 2025

CLA assistant check
All committers have signed the CLA.

@fzhedu
Copy link
Collaborator

fzhedu commented Jan 4, 2026

can you add some tests about the simd code? besides, it is better to report the performance gain by comparing to the scalar code

@fzhedu fzhedu self-requested a review January 4, 2026 06:18
conanfile.py Outdated
# Support CRC & NEON on ARMv8
flags = f"{self.BOLT_GLOABL_FLAGS} -march=armv8.3-a"
# Support CRC & NEON & SVE on ARMv8
flags = f"{self.BOLT_GLOABL_FLAGS} -march=armv8.3-a+sve -msve-vector-bits=256 -DSVE_BITS=256"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't work on the old ARM platform which only support Neon?
Is any compiler flag to detect this? The specified preprocessor macro SVE_BITS does not work on all the hardware platforms

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't work on the old ARM platform which only support Neon? Is any compiler flag to detect this? The specified preprocessor macro SVE_BITS does not work on all the hardware platforms

Added 'lscpu' to determine if the current CPU supports the 'sve' instruction set

@yangzhg yangzhg added enhancement New feature or request performance performance improvement needed labels Jan 9, 2026
daojiancha

This comment was marked as resolved.

@KAlbert2333
Copy link
Author

can you add some tests about the simd code? besides, it is better to report the performance gain by comparing to the scalar code

We will provide reports on scalar and vector differences soon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request performance performance improvement needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Add SVE (Scalable Vector Extension) support for SimdUtil-inl.h

6 participants