Skip to content

Conversation

@liubo-intel
Copy link
Contributor

Details:

  • Use actual attention mask input precision instead of compute precision (bf16/f16) to fix LFM2-350M output corruption when running with low precision on Xeon platforms.

Tickets:

Use actual attention mask input precision instead of compute precision (bf16/f16) to fix LFM2-350M output corruption when running with low precision on Xeon platforms.
@liubo-intel liubo-intel requested review from a team as code owners December 5, 2025 07:22
@github-actions github-actions bot added the category: CPU OpenVINO CPU plugin label Dec 5, 2025
@liubo-intel liubo-intel requested a review from Copilot December 5, 2025 07:31
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes precision handling for attention masks in the Scaled Dot Product Attention (SDPA) implementation when running with BF16/F16 inference precision on CPU. The issue caused output corruption in LFM2-350M model on Xeon platforms.

Key Changes:

  • Uses actual attention mask input precision instead of assuming compute precision (bf16/f16)
  • Fixes pointer arithmetic to use byte-based strides for attention masks
  • Adds comprehensive test coverage for stateful SDPA with boolean masks in BF16 precision

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
stateful_sdpa_bool_mask.cpp Adds new test case validating SDPA with boolean masks in BF16 inference mode
scaled_attn.cpp Fixes attention mask precision detection and pointer arithmetic for both ONEDNN and ACL kernel implementations

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: CPU OpenVINO CPU plugin

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant