Add automatic CPU feature detection #21

calebzulawski · 2024-04-12T03:47:05Z

I love this project! I joined the std::simd team originally after getting frustrated writing my fourier crate.
Full disclosure--the multiversion crate is mine, but I think it works perfectly here, especially considering the desire to forbid unsafe code.

I removed all references to -Ctarget-cpu=native and the like--I believe with this change, it should make no difference, at least on x86(-64) and aarch64. Plus, in my opinion, it's only useful for research and not commercial/enterprise software, since it's not really possible to redistribute -Ctarget-cpu=native code. I think this benchmark is more faithful to real use.

This PR is going to need a follow-up commit updating the benchmark results.

Shnatsel

Thanks a lot for the PR!

I figured we'd add multiversion eventually. I'm very happy to get a PR with it!

Do I understand correctly that ARM does not need multiversioning because Aarch64 always has NEON, and portable SIMD does not map well to SVE?

Shnatsel · 2024-04-12T16:09:25Z

src/cobra.rs

+#[multiversion::multiversion(targets("x86_64+avx512f+avx512bw+avx512cd+avx512dq+avx512vl", // x86_64-v4
+                                     "x86_64+avx2+fma", // x86_64-v3
+                                     "x86_64+sse4.2", // x86_64-v2
+                                     "x86+avx512f+avx512bw+avx512cd+avx512dq+avx512vl",


Is 32-bit x86 with AVX-512 common enough to bloat the binary with dedicated code for it? Ditto for AVX2.

Hard to say, I really don't know. This code only gets generated when building for x86, not x86-64. What do people run x86 on these days? Is it more likely someone is running on some Pentium 4 or just a modern CPU in 32-bit mode? I'm not really sure.

smu160 · 2024-04-12T16:28:57Z

Hi @calebzulawski,

Thank you for the kind words and your contribution! I'm going to try to figure out why the coverage check is failing, and then I'll re-run the benchmarks on the same machine.

Best,
Saveliy

calebzulawski · 2024-04-12T16:38:59Z

Do I understand correctly that ARM does not need multiversioning because Aarch64 always has NEON, and portable SIMD does not map well to SVE?

Correct--standard aarch64 has neon, so no need to detect it. You can have nonstandard aarch64 without it, but then there is no need to detect it. SVE actually can map to portable SIMD, but it will need more support from the compiler. E.g. you can have separate SVE-128 and SVE-256 target features that each refer to a specific SVE register size, detected at runtime. I digress...

calebzulawski · 2024-04-18T23:34:12Z

I rebased--not sure if that's expected to fix the code coverage issue.

smu160 · 2024-04-24T17:48:53Z

@calebzulawski

Here are the benchmark results. Since your PR was already merged, I ran this from the main branch.

calebzulawski · 2024-04-24T20:31:10Z

Thank you! I think that looks comparable to the old benchmark results, what do you think?

smu160 · 2024-04-25T15:31:55Z

@calebzulawski

This looks great! Thank you for your contribution. This was the last thing I wanted merged prior to releasing the next version.

smu160 · 2024-04-25T18:58:21Z

@calebzulawski

Just wanted to add that I re-ran the benchmarks for pyphastft (the python bindings via PyO3) as well. All of the new plots are in the readme. Thank you!

Best,
Saveliy

calebzulawski · 2024-04-26T00:03:51Z

Awesome! Excited to see a new release!

smu160 added the enhancement New feature or request label Apr 12, 2024

Shnatsel reviewed Apr 12, 2024

View reviewed changes

Add automatic CPU feature detection

f34eb32

calebzulawski force-pushed the multiversion branch from 4985516 to f34eb32 Compare April 18, 2024 23:33

smu160 merged commit 382d190 into QuState:main Apr 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add automatic CPU feature detection #21

Add automatic CPU feature detection #21

calebzulawski commented Apr 12, 2024

Shnatsel left a comment

Shnatsel Apr 12, 2024

calebzulawski Apr 12, 2024

smu160 commented Apr 12, 2024

calebzulawski commented Apr 12, 2024

calebzulawski commented Apr 18, 2024

smu160 commented Apr 24, 2024

calebzulawski commented Apr 24, 2024

smu160 commented Apr 25, 2024

smu160 commented Apr 25, 2024

calebzulawski commented Apr 26, 2024

Add automatic CPU feature detection #21

Add automatic CPU feature detection #21

Conversation

calebzulawski commented Apr 12, 2024

Shnatsel left a comment

Choose a reason for hiding this comment

Shnatsel Apr 12, 2024

Choose a reason for hiding this comment

calebzulawski Apr 12, 2024

Choose a reason for hiding this comment

smu160 commented Apr 12, 2024

calebzulawski commented Apr 12, 2024

calebzulawski commented Apr 18, 2024

smu160 commented Apr 24, 2024

calebzulawski commented Apr 24, 2024

smu160 commented Apr 25, 2024

smu160 commented Apr 25, 2024

calebzulawski commented Apr 26, 2024