Skip to content

Conversation

@Shnatsel
Copy link
Collaborator

Use SIMD operations for bit reversal as per #61

I gave Claude Code the original paper and the tests it has to pass, and iterated a bit, and got this. Not ready for merging: needs cleanup, porting to stable, ideally adaptation to native SIMD size and different SIMD sizes for f32/f64.

on zen4 blows everything else out of the water, benchmarks are green across the board: https://gist.github.com/Shnatsel/fd48d7ca13a3e5e5c01e9620c249e8e2

@Shnatsel
Copy link
Collaborator Author

Apple M4 also benefits at chunk size 4, if not quite as much: https://pastebin.com/tcSHgsqf

With this PR as-is with chunk size 8 performance collapses, so we'll need some sort of selection mechanism - maybe #60 maybe something simpler

@Shnatsel
Copy link
Collaborator Author

wide doesn't have interleave/deinterleave operations but fearless_simd does. Porting to fearless_simd is ongoing in #58

@Shnatsel Shnatsel mentioned this pull request Jan 22, 2026
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants