[CPU] Fix SDPA node attention mask precision handling for bf16/f16 inference #33132

liubo-intel · 2025-12-05T07:22:52Z

Details:

Use actual attention mask input precision instead of compute precision (bf16/f16) to fix LFM2-350M output corruption when running with low precision on Xeon platforms.

Tickets:

CVS-177340

Use actual attention mask input precision instead of compute precision (bf16/f16) to fix LFM2-350M output corruption when running with low precision on Xeon platforms.

Copilot

Pull request overview

This PR fixes precision handling for attention masks in the Scaled Dot Product Attention (SDPA) implementation when running with BF16/F16 inference precision on CPU. The issue caused output corruption in LFM2-350M model on Xeon platforms.

Key Changes:

Uses actual attention mask input precision instead of assuming compute precision (bf16/f16)
Fixes pointer arithmetic to use byte-based strides for attention masks
Adds comprehensive test coverage for stateful SDPA with boolean masks in BF16 precision

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
stateful_sdpa_bool_mask.cpp	Adds new test case validating SDPA with boolean masks in BF16 inference mode
scaled_attn.cpp	Fixes attention mask precision detection and pointer arithmetic for both ONEDNN and ACL kernel implementations

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/plugins/intel_cpu/src/nodes/scaled_attn.cpp

[CPU] Fix attention mask precision handling in ScaledDotProductAttention

ec830a2

Use actual attention mask input precision instead of compute precision (bf16/f16) to fix LFM2-350M output corruption when running with low precision on Xeon platforms.

liubo-intel requested review from a team as code owners December 5, 2025 07:22

github-actions bot added the category: CPU OpenVINO CPU plugin label Dec 5, 2025

liubo-intel requested a review from Copilot December 5, 2025 07:31

Copilot AI reviewed Dec 5, 2025

View reviewed changes

src/plugins/intel_cpu/src/nodes/scaled_attn.cpp Outdated Show resolved Hide resolved

src/plugins/intel_cpu/src/nodes/scaled_attn.cpp Outdated Show resolved Hide resolved

src/plugins/intel_cpu/src/nodes/scaled_attn.cpp Outdated Show resolved Hide resolved

Apply review comments

eb9ea32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CPU] Fix SDPA node attention mask precision handling for bf16/f16 inference #33132

[CPU] Fix SDPA node attention mask precision handling for bf16/f16 inference #33132

Uh oh!

liubo-intel commented Dec 5, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[CPU] Fix SDPA node attention mask precision handling for bf16/f16 inference #33132

Are you sure you want to change the base?

[CPU] Fix SDPA node attention mask precision handling for bf16/f16 inference #33132

Uh oh!

Conversation

liubo-intel commented Dec 5, 2025

Details:

Tickets:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant