-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add automatic CPU feature detection #21
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the PR!
I figured we'd add multiversion
eventually. I'm very happy to get a PR with it!
Do I understand correctly that ARM does not need multiversioning because Aarch64 always has NEON, and portable SIMD does not map well to SVE?
#[multiversion::multiversion(targets("x86_64+avx512f+avx512bw+avx512cd+avx512dq+avx512vl", // x86_64-v4 | ||
"x86_64+avx2+fma", // x86_64-v3 | ||
"x86_64+sse4.2", // x86_64-v2 | ||
"x86+avx512f+avx512bw+avx512cd+avx512dq+avx512vl", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is 32-bit x86 with AVX-512 common enough to bloat the binary with dedicated code for it? Ditto for AVX2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hard to say, I really don't know. This code only gets generated when building for x86, not x86-64. What do people run x86 on these days? Is it more likely someone is running on some Pentium 4 or just a modern CPU in 32-bit mode? I'm not really sure.
Hi @calebzulawski, Thank you for the kind words and your contribution! I'm going to try to figure out why the coverage check is failing, and then I'll re-run the benchmarks on the same machine. Best, |
Correct--standard aarch64 has neon, so no need to detect it. You can have nonstandard aarch64 without it, but then there is no need to detect it. SVE actually can map to portable SIMD, but it will need more support from the compiler. E.g. you can have separate SVE-128 and SVE-256 target features that each refer to a specific SVE register size, detected at runtime. I digress... |
4985516
to
f34eb32
Compare
I rebased--not sure if that's expected to fix the code coverage issue. |
Here are the benchmark results. Since your PR was already merged, I ran this from the main branch. |
Thank you! I think that looks comparable to the old benchmark results, what do you think? |
This looks great! Thank you for your contribution. This was the last thing I wanted merged prior to releasing the next version. |
Just wanted to add that I re-ran the benchmarks for Best, |
Awesome! Excited to see a new release! |
I love this project! I joined the
std::simd
team originally after getting frustrated writing myfourier
crate.Full disclosure--the
multiversion
crate is mine, but I think it works perfectly here, especially considering the desire to forbid unsafe code.I removed all references to
-Ctarget-cpu=native
and the like--I believe with this change, it should make no difference, at least on x86(-64) and aarch64. Plus, in my opinion, it's only useful for research and not commercial/enterprise software, since it's not really possible to redistribute-Ctarget-cpu=native
code. I think this benchmark is more faithful to real use.This PR is going to need a follow-up commit updating the benchmark results.