Skip to content

Conversation

@valadaptive
Copy link
Contributor

I meant to use this method to implement the from_slice method, but apparently used load_array instead.

@LaurenzV
Copy link
Collaborator

Perhaps this will fix our issues with #171!

@Shnatsel
Copy link
Contributor

This fixed much of my issues with QuState/PhastFT#58 !

There's still a performance gap vs wide but this closes much of it!

@LaurenzV
Copy link
Collaborator

How much is left?

@Shnatsel
Copy link
Contributor

Shnatsel commented Jan 22, 2026

fearless_simd is 7% to 13% slower on Apple M4 depending on the benchmark (based on a quick run, not a full run; I can do a full run with more tests later).

On x86 (Zen4) it ranges from on par to 6% worse but that's not perfectly apples-to-apples since Zen4 has AVX-512 (emulated, double-pumped) that wide can use but fearless_simd cannot so I'm not too worried about that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants