ggml-cpu : inspect -march and -mcpu to found the CPU #16333

angt · 2025-09-29T14:05:30Z

This is related to the PR #16239

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

kevinzs2048 · 2025-09-29T14:35:38Z

Thanks for working on this, I can still meet that issue on my Debian 12 VM in Mac M4 Pro after adpot your patch, GCC 12.2:

[ 16%] Building CXX object ggml/src/CMakeFiles/ggml-cpu.dir/ggml-cpu/arch/arm/repack.cpp.o
In file included from /root/llama.cpp/ggml/src/./ggml-impl.h:24,
                 from /root/llama.cpp/ggml/src/ggml-cpu/vec.h:5,
                 from /root/llama.cpp/ggml/src/ggml-cpu/vec.cpp:1:
/usr/lib/gcc/aarch64-linux-gnu/12/include/arm_neon.h: In function ‘void ggml_vec_dot_f16(int, float*, size_t, ggml_fp16_t*, size_t, ggml_fp16_t*, size_t, int)’:
/usr/lib/gcc/aarch64-linux-gnu/12/include/arm_neon.h:29182:1: error: inlining failed in call to ‘always_inline’ ‘float16x8_t vfmaq_f16(float16x8_t, float16x8_t, float16x8_t)’: target specific option mismatch
29182 | vfmaq_f16 (float16x8_t __a, float16x8_t __b, float16x8_t __c)
      | ^~~~~~~~~
In file included from /root/llama.cpp/ggml/src/ggml-cpu/vec.h:6:
/root/llama.cpp/ggml/src/ggml-cpu/simd-mappings.h:365:46: note: called from here
  365 |     #define GGML_F16x8_FMA(a, b, c) vfmaq_f16(a, b, c)
      |                                     ~~~~~~~~~^~~~~~~~~
/root/llama.cpp/ggml/src/ggml-cpu/simd-mappings.h:392:41: note: in expansion of macro ‘GGML_F16x8_FMA’
  392 |     #define GGML_F16_VEC_FMA            GGML_F16x8_FMA
      |                                         ^~~~~~~~~~~~~~
/root/llama.cpp/ggml/src/ggml-cpu/vec.cpp:316:26: note: in expansion of macro ‘GGML_F16_VEC_FMA’
  316 |                 sum[j] = GGML_F16_VEC_FMA(sum[j], ax[j], ay[j]);
      |                          ^~~~~~~~~~~~~~~~
/usr/lib/gcc/aarch64-linux-gnu/12/include/arm_neon.h:29182:1: error: inlining failed in call to ‘always_inline’ ‘float16x8_t vfmaq_f16(float16x8_t, float16x8_t, float16x8_t)’: target specific option mismatch
29182 | vfmaq_f16 (float16x8_t __a, float16x8_t __b, float16x8_t __c)
      | ^~~~~~~~~

The cmake config output:

root@llm-test:~/llama.cpp# cmake -B build
-- The C compiler identification is GNU 12.2.0
-- The CXX compiler identification is GNU 12.2.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
CMAKE_BUILD_TYPE=Release
-- Found Git: /usr/bin/git (found version "2.39.5")
-- The ASM compiler identification is GNU
-- Found assembler: /usr/bin/cc
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Warning: ccache not found - consider installing it for faster compilation or disable this warning with GGML_CCACHE=OFF
-- CMAKE_SYSTEM_PROCESSOR: aarch64
-- GGML_SYSTEM_ARCH: ARM
-- Including CPU backend
-- Found OpenMP_C: -fopenmp (found version "4.5")
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5")
-- ARM detected
-- Performing Test GGML_COMPILER_SUPPORTS_FP16_FORMAT_I3E
-- Performing Test GGML_COMPILER_SUPPORTS_FP16_FORMAT_I3E - Failed
-- ARM detected flags: -march=armv8-a+crypto+crc+lse+rcpc+rdma+dotprod+fp16fml+sb+sve2+flagm+pauth
-- Performing Test GGML_MACHINE_SUPPORTS_dotprod
-- Performing Test GGML_MACHINE_SUPPORTS_dotprod - Failed
-- Performing Test GGML_MACHINE_SUPPORTS_nodotprod
-- Performing Test GGML_MACHINE_SUPPORTS_nodotprod - Success
-- Performing Test GGML_MACHINE_SUPPORTS_i8mm
-- Performing Test GGML_MACHINE_SUPPORTS_i8mm - Failed
-- Performing Test GGML_MACHINE_SUPPORTS_noi8mm
-- Performing Test GGML_MACHINE_SUPPORTS_noi8mm - Success
-- Performing Test GGML_MACHINE_SUPPORTS_sve
-- Performing Test GGML_MACHINE_SUPPORTS_sve - Failed
-- Performing Test GGML_MACHINE_SUPPORTS_nosve
-- Performing Test GGML_MACHINE_SUPPORTS_nosve - Success
-- Performing Test GGML_MACHINE_SUPPORTS_sme
-- Performing Test GGML_MACHINE_SUPPORTS_sme - Failed
-- Performing Test GGML_MACHINE_SUPPORTS_nosme
-- Performing Test GGML_MACHINE_SUPPORTS_nosme - Failed
-- ARM feature FMA enabled
-- ARM feature FP16_VECTOR_ARITHMETIC enabled
-- Adding CPU backend variant ggml-cpu: -march=armv8-a+crypto+crc+lse+rcpc+rdma+dotprod+fp16fml+sb+sve2+flagm+pauth+nodotprod+noi8mm+nosve
-- ggml version: 0.9.0-dev
-- ggml commit:  d12f6df1
-- Found CURL: /usr/lib/aarch64-linux-gnu/libcurl.so (found version "7.88.1")
-- Configuring done
-- Generating done
-- Build files have been written to: /root/llama.cpp/build

angt · 2025-09-29T15:04:06Z

At least the code works as expected 😅

But yes, something is really wrong here too:

-- ARM detected flags: -march=armv8-a+crypto+crc+lse+rcpc+rdma+dotprod+fp16fml+sb+sve2+flagm+pauth
-- Performing Test GGML_MACHINE_SUPPORTS_dotprod
-- Performing Test GGML_MACHINE_SUPPORTS_dotprod - Failed
-- Performing Test GGML_MACHINE_SUPPORTS_nodotprod
-- Performing Test GGML_MACHINE_SUPPORTS_nodotprod - Success
...
-- Adding CPU backend variant ggml-cpu: -march=armv8-a+crypto+crc+lse+rcpc+rdma+dotprod+fp16fml+sb+sve2+flagm+pauth+nodotprod+noi8mm+nosve

I dont think it's related to this PR, this is another issue to me.

Could you share the output of these commands:

curl -o feat https://github.com/angt/target-features/releases/download/v5/aarch64-macos-target-features
chmod +x feat
./feat

and

sysctl -a | grep FEAT

on the M4 and inside the VM:

curl -o feat https://github.com/angt/target-features/releases/download/v5/aarch64-linux-target-features
chmod +x feat
./feat

ggml-cpu : inspect -march and -mcpu to found the CPU

d12f6df

Signed-off-by: Adrien Gallouët <angt@huggingface.co>

angt requested review from ggerganov and slaren as code owners September 29, 2025 14:05

angt mentioned this pull request Sep 29, 2025

ggml-cpu: detect correct cpu flags for arm64 (#16229) #16239

Open

github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label Sep 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ggml-cpu : inspect -march and -mcpu to found the CPU #16333

ggml-cpu : inspect -march and -mcpu to found the CPU #16333

angt commented Sep 29, 2025

Uh oh!

kevinzs2048 commented Sep 29, 2025 •

edited

Loading

Uh oh!

angt commented Sep 29, 2025

Uh oh!

Uh oh!

ggml-cpu : inspect -march and -mcpu to found the CPU #16333

Are you sure you want to change the base?

ggml-cpu : inspect -march and -mcpu to found the CPU #16333

Conversation

angt commented Sep 29, 2025

Uh oh!

kevinzs2048 commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

angt commented Sep 29, 2025

Uh oh!

Uh oh!

kevinzs2048 commented Sep 29, 2025 •

edited

Loading