Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add automatic CPU feature detection #21

Merged
merged 1 commit into from
Apr 18, 2024

Conversation

calebzulawski
Copy link
Contributor

I love this project! I joined the std::simd team originally after getting frustrated writing my fourier crate.
Full disclosure--the multiversion crate is mine, but I think it works perfectly here, especially considering the desire to forbid unsafe code.

I removed all references to -Ctarget-cpu=native and the like--I believe with this change, it should make no difference, at least on x86(-64) and aarch64. Plus, in my opinion, it's only useful for research and not commercial/enterprise software, since it's not really possible to redistribute -Ctarget-cpu=native code. I think this benchmark is more faithful to real use.

This PR is going to need a follow-up commit updating the benchmark results.

@smu160 smu160 added the enhancement New feature or request label Apr 12, 2024
Copy link
Collaborator

@Shnatsel Shnatsel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the PR!

I figured we'd add multiversion eventually. I'm very happy to get a PR with it!

Do I understand correctly that ARM does not need multiversioning because Aarch64 always has NEON, and portable SIMD does not map well to SVE?

#[multiversion::multiversion(targets("x86_64+avx512f+avx512bw+avx512cd+avx512dq+avx512vl", // x86_64-v4
"x86_64+avx2+fma", // x86_64-v3
"x86_64+sse4.2", // x86_64-v2
"x86+avx512f+avx512bw+avx512cd+avx512dq+avx512vl",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is 32-bit x86 with AVX-512 common enough to bloat the binary with dedicated code for it? Ditto for AVX2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard to say, I really don't know. This code only gets generated when building for x86, not x86-64. What do people run x86 on these days? Is it more likely someone is running on some Pentium 4 or just a modern CPU in 32-bit mode? I'm not really sure.

@smu160
Copy link
Member

smu160 commented Apr 12, 2024

Hi @calebzulawski,

Thank you for the kind words and your contribution! I'm going to try to figure out why the coverage check is failing, and then I'll re-run the benchmarks on the same machine.

Best,
Saveliy

@calebzulawski
Copy link
Contributor Author

Do I understand correctly that ARM does not need multiversioning because Aarch64 always has NEON, and portable SIMD does not map well to SVE?

Correct--standard aarch64 has neon, so no need to detect it. You can have nonstandard aarch64 without it, but then there is no need to detect it. SVE actually can map to portable SIMD, but it will need more support from the compiler. E.g. you can have separate SVE-128 and SVE-256 target features that each refer to a specific SVE register size, detected at runtime. I digress...

@calebzulawski
Copy link
Contributor Author

I rebased--not sure if that's expected to fix the code coverage issue.

@smu160 smu160 merged commit 382d190 into QuState:main Apr 18, 2024
@smu160
Copy link
Member

smu160 commented Apr 24, 2024

@calebzulawski

Here are the benchmark results. Since your PR was already merged, I ran this from the main branch.

benchmarks_bar_plot_4_12
benchmarks_bar_plot_13_29

@calebzulawski
Copy link
Contributor Author

Thank you! I think that looks comparable to the old benchmark results, what do you think?

@smu160
Copy link
Member

smu160 commented Apr 25, 2024

@calebzulawski

This looks great! Thank you for your contribution. This was the last thing I wanted merged prior to releasing the next version.

@smu160
Copy link
Member

smu160 commented Apr 25, 2024

@calebzulawski

Just wanted to add that I re-ran the benchmarks for pyphastft (the python bindings via PyO3) as well. All of the new plots are in the readme. Thank you!

Best,
Saveliy

@calebzulawski
Copy link
Contributor Author

Awesome! Excited to see a new release!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants