-
Notifications
You must be signed in to change notification settings - Fork 13
Port to fearless_simd #58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #58 +/- ##
==========================================
+ Coverage 99.82% 99.85% +0.02%
==========================================
Files 13 13
Lines 2258 2706 +448
==========================================
+ Hits 2254 2702 +448
Misses 4 4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
On Zen4 This gives up to 7% penalty due to not utilizing AVX-512, but otherwise looks normal. We don't need explicit mul_neg_add on x86 it seems, this is lowered into the correct instruction automatically. On Apple M4 this is a large regression. The hottest instructions are loads/stores to/from the stack for f32x16, so it might be due to register pressure or some such (LLVM isn't great at dealing with that). I'll need to investigate how |
This is a wild guess (I don't have Apple Silicon hardware, so I can't benchmark any of this), but the way you're loading from a slice looks a bit convoluted. Instead of e.g. let in0_re = f32x4::simd_from(simd, <[f32; 4]>::try_from(&reals_s0[0..4]).unwrap());have you tried simply: let in0_re = f32x4::from_slice(simd, &reals_s0[0..4]));Also just to confirm, you ran this with the latest fearless_simd from Git, correct? linebender/fearless_simd#159 aimed to improve codegen around SIMD loads, and linebender/fearless_simd#181 just landed a couple days ago and adds (potentially) faster methods for SIMD stores. |
|
Yep, this is on latest fearless_simd from git. I'll see if I've also tried swapping vector repr from arrays to structs to mimic |
|
CI is broken in a really interesting way: it complains about mul_neg_add which doesn't appear anywhere in the code on the latest commit. It's either running on an old commit or on a different branch; either way that could be exploitable if it can be reproduced. |
|
Nope, no difference in performance from changing loads/stores. Looks like a readability win to me though. |
No description provided.