-
-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
simd: split cursor advancing from value matching #156
simd: split cursor advancing from value matching #156
Conversation
eea5c01
to
3aaac3a
Compare
64b4de5
to
4ab2ffb
Compare
This refactors all SIMD modules in order to make the value-matching logic self-contained. Thus, all bytes-cursor manipulations are now grouped and performed once at the end, outside of SIMD logic.
4ab2ffb
to
a88052f
Compare
@seanmonstar this is ready for a review pass, whenever you have time. There is a minor cleanup bundled in this PR (marking several functions as I'll be honest, I started doing this rework as part of hyperium/hyper#3574 before actually going for hyperium/hyper#3575, focused on memory usage/allocation patterns. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Beautiful PR, and the speed boosts seem out of this world!
Thanks for merging this. Even if I recorded those perf numbers myself, I'm still somehow puzzled and a bit skeptical about them. Overall, I think the new code is a useful refactor but I personally won't guarantee the pictured performance changes to be valid in all environments. |
This reverts commit b2625f3.
This reverts commit b2625f3.
This has massive implications on the default runtime perf, improving how the code is lowered/inlined. (Falling back to SSE4.2 for a handful of bytes was wasteful). Should supersede seanmonstar#175, seanmonstar#156
This refactors all SIMD modules in order to make the value-matching logic self-contained. Thus, all bytes-cursor manipulations are now grouped and performed once at the end, outside of SIMD logic.
Performance impact on my Intel AVX2-capable workstation seems positive (arbitrary benchmark-noise filtering at >20%):