Port to fearless_simd #58

Shnatsel · 2025-12-12T14:02:58Z

No description provided.

codecov-commenter · 2025-12-12T14:05:54Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.85%. Comparing base (7b15bc2) to head (c955c97).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main      #58      +/-   ##
==========================================
+ Coverage   99.82%   99.85%   +0.02%     
==========================================
  Files          13       13              
  Lines        2258     2706     +448     
==========================================
+ Hits         2254     2702     +448     
  Misses          4        4

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…han multiversion

…ly not messing anything up

Shnatsel · 2026-01-21T13:28:11Z

On Zen4 This gives up to 7% penalty due to not utilizing AVX-512, but otherwise looks normal. We don't need explicit mul_neg_add on x86 it seems, this is lowered into the correct instruction automatically.

On Apple M4 this is a large regression. The hottest instructions are loads/stores to/from the stack for f32x16, so it might be due to register pressure or some such (LLVM isn't great at dealing with that). I'll need to investigate how wide lowers this kind of thing to NEON, its approach is apparently better than that of fearless_simd. Or we could rewrite the function to operate on native vectors but then we might give up some ILP?

valadaptive · 2026-01-21T16:47:16Z

On Apple M4 this is a large regression. The hottest instructions are loads/stores to/from the stack for f32x16, so it might be due to register pressure or some such

This is a wild guess (I don't have Apple Silicon hardware, so I can't benchmark any of this), but the way you're loading from a slice looks a bit convoluted. Instead of e.g.

let in0_re = f32x4::simd_from(simd, <[f32; 4]>::try_from(&reals_s0[0..4]).unwrap());

have you tried simply:

let in0_re = f32x4::from_slice(simd, &reals_s0[0..4]));

Also just to confirm, you ran this with the latest fearless_simd from Git, correct? linebender/fearless_simd#159 aimed to improve codegen around SIMD loads, and linebender/fearless_simd#181 just landed a couple days ago and adds (potentially) faster methods for SIMD stores.

Shnatsel · 2026-01-21T17:18:22Z

Yep, this is on latest fearless_simd from git. I'll see if from_slice does anything, it's certainly more readable.

I've also tried swapping vector repr from arrays to structs to mimic wide internal representation but it didn't make a difference.

Shnatsel · 2026-01-21T17:51:26Z

CI is broken in a really interesting way: it complains about mul_neg_add which doesn't appear anywhere in the code on the latest commit. It's either running on an old commit or on a different branch; either way that could be exploitable if it can be reproduced.

Shnatsel · 2026-01-21T18:08:59Z

Nope, no difference in performance from changing loads/stores. Looks like a readability win to me though.

Shnatsel added 7 commits December 8, 2025 15:55

WIP

679c22b

Merge branch 'main' into fearless-simd

aba214e

Adapt conversion to slice for fearless_simd

f7a1133

Fully convert fft_dit_chunk_8_simd_f64 to fearless_simd

28166a9

Convert the rest of DIT functions to fearless_simd

1333b09

Clean up imports

1b17dbe

Wire up new DIT kernel function signatures to DIT process

6244269

Shnatsel added 4 commits December 12, 2025 18:37

Dispatch to multiversioned fft_dit_chunk_2 via fearless_simd rather t…

f003c14

…han multiversion

move SIMD dispatch one level higher so that it's definitely, positive…

a19a4e5

…ly not messing anything up

Update for the swapped order of arguments in simd_from

35a7b60

cargo fmt

6cbf4ef

Shnatsel mentioned this pull request Jan 21, 2026

PoC: BRAVO bit reversal #62

Draft

Simpler SIMD loads

3ff0b45

Simpler SIMD stores

c955c97

Shnatsel mentioned this pull request Jan 22, 2026

Next release checklist #63

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Port to fearless_simd #58

Port to fearless_simd #58

Shnatsel commented Dec 12, 2025

Uh oh!

codecov-commenter commented Dec 12, 2025 •

edited

Loading

Uh oh!

Shnatsel commented Jan 21, 2026

Uh oh!

valadaptive commented Jan 21, 2026

Uh oh!

Shnatsel commented Jan 21, 2026 •

edited

Loading

Uh oh!

Shnatsel commented Jan 21, 2026

Uh oh!

Shnatsel commented Jan 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Port to fearless_simd #58

Are you sure you want to change the base?

Port to fearless_simd #58

Conversation

Shnatsel commented Dec 12, 2025

Uh oh!

codecov-commenter commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Shnatsel commented Jan 21, 2026

Uh oh!

valadaptive commented Jan 21, 2026

Uh oh!

Shnatsel commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Shnatsel commented Jan 21, 2026

Uh oh!

Shnatsel commented Jan 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov-commenter commented Dec 12, 2025 •

edited

Loading

Shnatsel commented Jan 21, 2026 •

edited

Loading