Skip to content

refactor/march based reorganization [WIP]#193

Open
richyreachy wants to merge 24 commits intomainfrom
refactor/march_based_reorganization
Open

refactor/march based reorganization [WIP]#193
richyreachy wants to merge 24 commits intomainfrom
refactor/march_based_reorganization

Conversation

@richyreachy
Copy link
Collaborator

march based reorganization

@greptile-apps
Copy link

greptile-apps bot commented Mar 3, 2026

Greptile Summary

This PR reorganizes the math library by splitting monolithic architecture-specific implementations into separate files per CPU instruction set (SSE, AVX, AVX2, AVX512, NEON) with runtime dispatching based on CPU features.

Key changes:

  • Split large files like euclidean_distance_matrix_fp32.cc into separate *_sse.cc, *_avx.cc, *_avx512.cc, *_neon.cc, and *_dispatch.cc files
  • Added architecture-specific compiler flags: -march=core-avx2 for AVX/AVX2, -march=sapphirerapids for AVX512
  • Simplified CPU detection in cmake/option.cmake to only check for x86-64
  • Applied this pattern across all metric types: fp16, fp32, int4, int8, inner product, euclidean distance, and MIPS

Critical issue:

  • The dispatch files are incorrectly compiled with AVX512 flags, causing runtime crashes on non-AVX512 CPUs. These files contain CPU feature detection logic and must be compiled with baseline flags.

Confidence Score: 1/5

  • This PR has a critical bug that will cause crashes on CPUs without AVX512 support
  • The dispatch files are compiled with AVX512 flags, which means they will generate AVX512 instructions. When the code runs on a CPU without AVX512 (e.g., older Intel CPUs, AMD Zen 1/2/3), it will crash with an illegal instruction error before runtime CPU detection can even execute. This is a fundamental architectural flaw that makes the code non-portable.
  • src/ailego/CMakeLists.txt requires immediate attention - the dispatch files compilation configuration must be fixed

Important Files Changed

Filename Overview
src/ailego/CMakeLists.txt Added march-based compilation flags; dispatch files incorrectly compiled with AVX512 flags
cmake/option.cmake Simplified CPU architecture detection to only check x86-64
src/ailego/math/euclidean_distance_matrix_fp32_dispatch.cc New dispatch file with runtime CPU detection; will crash on non-AVX512 CPUs due to compilation flags
src/ailego/math/inner_product_matrix_fp32_dispatch.cc New dispatch file for inner product; will crash on non-AVX512 CPUs due to compilation flags
src/ailego/math/euclidean_distance_matrix_int4_dispatch.cc New dispatch file for int4; will crash on non-AVX512 CPUs due to compilation flags

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Application Code] --> B[Dispatch Layer<br/>*_dispatch.cc]
    B --> C{Runtime CPU<br/>Feature Detection}
    C -->|AVX512 Available| D[AVX512 Implementation<br/>*_avx512.cc<br/>-march=sapphirerapids]
    C -->|AVX Available| E[AVX/AVX2 Implementation<br/>*_avx.cc / *_avx2.cc<br/>-march=core-avx2]
    C -->|SSE/Baseline| F[SSE Implementation<br/>*_sse.cc<br/>-march=broadwell]
    C -->|ARM NEON| G[NEON Implementation<br/>*_neon.cc<br/>-march=armv8-a]
    
    style B fill:#ff9999
    style C fill:#ff9999
    
    H[CRITICAL BUG] -.-> B
    H[CRITICAL BUG] -.-> C
    I[Dispatch files compiled<br/>with AVX512 flags] -.-> H
    J[Will crash on<br/>non-AVX512 CPUs] -.-> H
    
    style H fill:#ff0000,color:#fff
    style I fill:#ffcccc
    style J fill:#ffcccc
Loading

Last reviewed commit: 4370619

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

60 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment on lines +39 to +46
file(GLOB_RECURSE MATH_FILES_AVX512
${CMAKE_CURRENT_SOURCE_DIR}/math/*_dispatch.cc
${CMAKE_CURRENT_SOURCE_DIR}/math/*_dispatch.c
${CMAKE_CURRENT_SOURCE_DIR}/math/*_avx512.cc
${CMAKE_CURRENT_SOURCE_DIR}/math/*_avx512.c
${CMAKE_CURRENT_SOURCE_DIR}/math_batch/*_avx512.cc
${CMAKE_CURRENT_SOURCE_DIR}/math_batch/*_avx512.c
)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dispatch files should NOT be compiled with AVX512 flags. They contain runtime CPU detection logic and must be compiled with baseline flags (like the default broadwell) to run safely on all CPUs. Currently, these dispatch files will crash with illegal instruction errors on non-AVX512 CPUs.

Suggested change
file(GLOB_RECURSE MATH_FILES_AVX512
${CMAKE_CURRENT_SOURCE_DIR}/math/*_dispatch.cc
${CMAKE_CURRENT_SOURCE_DIR}/math/*_dispatch.c
${CMAKE_CURRENT_SOURCE_DIR}/math/*_avx512.cc
${CMAKE_CURRENT_SOURCE_DIR}/math/*_avx512.c
${CMAKE_CURRENT_SOURCE_DIR}/math_batch/*_avx512.cc
${CMAKE_CURRENT_SOURCE_DIR}/math_batch/*_avx512.c
)
file(GLOB_RECURSE MATH_FILES_AVX512
${CMAKE_CURRENT_SOURCE_DIR}/math/*_avx512.cc
${CMAKE_CURRENT_SOURCE_DIR}/math/*_avx512.c
${CMAKE_CURRENT_SOURCE_DIR}/math_batch/*_avx512.cc
${CMAKE_CURRENT_SOURCE_DIR}/math_batch/*_avx512.c
)
file(GLOB_RECURSE MATH_FILES_DISPATCH
${CMAKE_CURRENT_SOURCE_DIR}/math/*_dispatch.cc
${CMAKE_CURRENT_SOURCE_DIR}/math/*_dispatch.c
)

Comment on lines +63 to +69
foreach(MATH_FILE ${MATH_FILES_AVX512})
set_source_files_properties(
${MATH_FILE}
PROPERTIES
COMPILE_FLAGS "${MATH_MARCH_FLAG_AVX512}"
)
endforeach()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need separate loop for dispatch files with baseline flags

Suggested change
foreach(MATH_FILE ${MATH_FILES_AVX512})
set_source_files_properties(
${MATH_FILE}
PROPERTIES
COMPILE_FLAGS "${MATH_MARCH_FLAG_AVX512}"
)
endforeach()
foreach(MATH_FILE ${MATH_FILES_AVX512})
set_source_files_properties(
${MATH_FILE}
PROPERTIES
COMPILE_FLAGS "${MATH_MARCH_FLAG_AVX512}"
)
endforeach()
# Dispatch files should use baseline flags for runtime CPU detection
foreach(MATH_FILE ${MATH_FILES_DISPATCH})
set_source_files_properties(
${MATH_FILE}
PROPERTIES
COMPILE_FLAGS "-march=broadwell"
)
endforeach()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant