Skip to content

Conversation

ylpoonlg
Copy link
Contributor

This PR includes various improvements to the existing routines in SVE microbenchmark:

  • Improve Scalar performance by using fixed pointers for array access. This makes the comparison fairer against vector counterparts.
  • Rework StrIndexOf SVE implementations to improve performance. Also makes the Neon implementation more similar to GetIndexOfFirstNonAsciiChar in the C# SDK.
  • Add a SveTail version to Partition to avoid unnecessary conversions between predicates and vectors in the codegen.

Benchmark results

(Run on Nvidia Grace)

StrIndexOf:

Method Size Mean (Before) Mean (After) Speedup
Scalar 19 4.7269 ns 2.5919 ns 0.548x
Vector128IndexOf 19 3.7376 ns 0.3231 ns 0.086x
SveIndexOf 19 3.7094 ns 1.2280 ns 0.331x
SveTail 19 3.6244 ns 0.6806 ns 0.188x
Scalar 127 33.7784 ns 19.5391 ns 0.578x
Vector128IndexOf 127 24.9000 ns 2.0279 ns 0.081x
SveIndexOf 127 24.9709 ns 9.8849 ns 0.396x
SveTail 127 23.3407 ns 5.0614 ns 0.217x
Scalar 527 136.6931 ns 83.2942 ns 0.609x
Vector128IndexOf 527 31.3376 ns 12.3589 ns 0.394x
SveIndexOf 527 63.5482 ns 44.2022 ns 0.696x
SveTail 527 48.8708 ns 25.6798 ns 0.525x
Scalar 10015 2,656.9679 ns 1,559.0106 ns 0.587x
Vector128IndexOf 10015 390.8402 ns 288.6259 ns 0.738x
SveIndexOf 10015 964.7415 ns 903.0951 ns 0.936x
SveTail 10015 700.3612 ns 544.0103 ns 0.777x

Partition:

Method Size Mean Error StdDev Median Min Max Allocated
Scalar 15 6.197 ns 0.1209 ns 0.0944 ns 6.170 ns 6.160 ns 6.496 ns -
SvePartition 15 9.900 ns 0.0062 ns 0.0055 ns 9.901 ns 9.887 ns 9.907 ns -
SveTail 15 4.456 ns 0.0056 ns 0.0052 ns 4.455 ns 4.448 ns 4.466 ns -
Scalar 127 44.955 ns 4.5136 ns 5.0168 ns 41.902 ns 41.839 ns 55.372 ns -
SvePartition 127 87.033 ns 0.0430 ns 0.0381 ns 87.027 ns 86.974 ns 87.101 ns -
SveTail 127 56.420 ns 0.4204 ns 0.3932 ns 56.192 ns 56.121 ns 57.086 ns -
Scalar 527 229.889 ns 39.2392 ns 45.1879 ns 243.624 ns 179.879 ns 302.881 ns -
SvePartition 527 367.581 ns 0.2514 ns 0.2099 ns 367.634 ns 367.056 ns 367.881 ns -
SveTail 527 249.846 ns 0.5340 ns 0.4995 ns 249.587 ns 249.369 ns 250.973 ns -
Scalar 10015 3,559.428 ns 1.3667 ns 1.2115 ns 3,558.985 ns 3,558.258 ns 3,561.871 ns -
SvePartition 10015 6,525.739 ns 10.5519 ns 9.8703 ns 6,526.970 ns 6,506.622 ns 6,541.882 ns -
SveTail 10015 5,270.654 ns 3.6844 ns 3.0766 ns 5,270.898 ns 5,264.652 ns 5,277.305 ns -

* Rework SVE implementations to improve performance.
* Make the Neon implementation more similar to GetIndexOfFirstNonAsciiChar in the C# SDK.
* Use fixed array pointers for Scalar.
* Optimise Partition routine. Added a SveTail version to avoid
  unnecessary predicate conversions in codegen.

* Improve Scalar performance by using fixed pointers for array access.
  Makes the comparison fairer against vector counterparts.
@ylpoonlg
Copy link
Contributor Author

@dotnet/arm64-contrib @a74nh

@ylpoonlg ylpoonlg marked this pull request as ready for review September 24, 2025 15:07
@SwapnilGaikwad
Copy link

Hi @LoopedBard3, could you take a look at this PR, please?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants