perf: optimize sample_floyd by unsafe APIs#1622
perf: optimize sample_floyd by unsafe APIs#1622Unparalleled-Calvin wants to merge 2 commits intorust-random:masterfrom
Conversation
2bdea23 to
07d4e92
Compare
|
Thanks for the PR. My main concern here is simply: should we be adding more CC @RalfJung |
|
Thanks for considering! |
dhardy
left a comment
There was a problem hiding this comment.
There are two unsafe operations here; I'd like to see the perf impact of each.
Not sure what exactly you want my input on here. :) Happy to consult on whether some use of unsafe is sound or not, but that doesn't seem to be the question here? As to whether you think the bit of unsafe is worth the perf gain -- that's a maintainer decision. There's absolutely cases where the perf gain is important enough to justify a bit of unsafe and there are other cases where it's not worth it. I don't have to maintain this code going forward so I can't make this decision for you. :)
Of course, testing != verification, so there could still be UB in edge cases not covered by the tests. |
|
Thank you for your review! Here are the benchmark results of using the unsafe functions. Only use Additionally use From my perspective, the elimination of bounds checking in |
|
Sorry for the delay; I finally got around to running benches on my 5800X desktop. This is 07d4e92 vs d468501. Full results
On average, that's +1% (range -11% to +71%). Yes, there are caveats to this type of benchmarking: variance (I repeated one test a few times and had less than 1% change so probably okay), relevance (and weighting), but on the available evidence I don't see any significant benefit to this change. |
CHANGELOG.mdentrySummary
This PR uses unsafe APIs to boost performance of
sample_floyd. The optimization is totally safe because the index is bounded by the length of the vec.Motivation
Rust's bounds checking are sometimes unnecessary. Removing bounds checking by unsafe APIs can boost its performance.This optimization makes related functions more faster with safety ensured.
Details
The benchmark results from my environment is listed as below.