[MLAS] Fix rotary interleaved NEON kernel #26390
Draft
+119
−18
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The logic of interleaved NEON kernel is not correct from code review:
Test Code Logic:
The test code
test_rope.hallocates thesinandcostables based on theinterleavedflag:For the
interleaved = truecase, the test createssinandcostables of lengthrotary_emb_dim / 2.AVX2 (fp32) Kernel Logic (
interleaved = true):This kernel loads the
sin/cosdata using an index ofi / 2:This logic expects a
sin/costable of lengthrotary_emb_dim / 2.Conclusion: The AVX2 (fp32) kernel is consistent with the test code.
NEON (fp16) Kernel Logic (
interleaved = true):This kernel loads the
sin/cosdata using an index ofi:This logic expects a
sin/costable of lengthrotary_emb_dim.Conclusion: The NEON (fp16) kernel is NOT consistent with the test code.
Summary
The
RopeKernel_Avx2_fp32_Impl<true>kernel correctly aligns with the test code (and the fallback implementation) by expecting asin/costable of lengthrotary_emb_dim / 2.The
RopeKernel_Fp16_Impl<true>(NEON) kernel incorrectly expects a table of lengthrotary_emb_dim. When run against the provided test, the NEON kernel will read past the end of thesin_dataandcos_datavectors.