Skip to content

Conversation

@tianleiwu
Copy link
Contributor

@tianleiwu tianleiwu commented Oct 22, 2025

This fixes an issue that _mm256_maskload_ps intrinsic used in remainder-handling logic introduced in #23694.

The core of the problem is that _mm256_maskload_ps (and its store equivalent) can read beyond the masked elements.
Even if mask correctly specifies that you only want to load, for example, 3 floats, the intrinsic may still read the full 32 bytes (8 floats) from the provided memory address.

The invalid access occurs when one of buffers (input, sin_data, or cos_data) ends near the boundary of a memory page, and the part of the 32-byte read that you don't care about (i.e., the masked-off part) falls onto an unmapped page. This will cause a segmentation fault (invalid access).

The Solution: Use a Scalar Remainder Loop

The simplest, safest, and most robust solution is to replace the masked AVX remainder logic with a simple scalar loop. This is the exact strategy already used by your RopeKernel_Avx2_fp16_Impl functions, which are safe from this bug.

The performance impact of this change will be negligible, as this loop only processes the final 1-15 elements.

@tianleiwu tianleiwu marked this pull request as draft October 22, 2025 22:26
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a critical memory access violation in the AVX2 rotary embedding kernel. The issue stems from the use of _mm256_maskload_ps and _mm256_maskstore_ps intrinsics, which can read/write beyond the masked elements, potentially causing segmentation faults when buffers are near page boundaries.

Key Changes:

  • Removed masked AVX2 remainder handling logic that could cause invalid memory access
  • Replaced with safe scalar loops for processing trailing elements (1-15 elements)

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

tianleiwu and others added 2 commits October 22, 2025 16:46
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@tianleiwu tianleiwu marked this pull request as ready for review October 23, 2025 20:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants