-
-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(simd): avx2 fallback to swar instead of sse4.2 #181
Conversation
This has massive implications on the default runtime perf, improving how the code is lowered/inlined. (Falling back to SSE4.2 for a handful of bytes was wasteful). Should supersede seanmonstar#175, seanmonstar#156
@seanmonstar @lucab I'll bench this against for #175 for completeness sake, but this is substantially simpler/focused, should not regress |
#175 vs #181 (this PR)TLDR: this demonstrates #181 provides the bulk of #175's benefits, with trivial/minimal focused changes on the core sse42/avx2 interplay issue.
@lucab @seanmonstar I think we should land this, no regressions on aarch64, focused change on the core problem. Other improvements #175 provided can be explored in focused follow-ups. |
For good measure, compared
|
TLDR: 2-line change => 2x faster
req/req
(doesn't raise the ceiling substantially but fixes perf issue of generic x64 build, using runtime dispatch)
This has massive implications on the default
simd::runtime::*
(x64 generic build) perf, improving how the code is lowered/inlined. (Falling back to SSE4.2 for a handful of bytes was wasteful).Should supersede #175, #156
Benchmarks on GH CodeSpace (4-core / 16GB)
(4 cores of a 64-core AMD EPYC 7763 host CPU)