[CPU] Add PagedAttention Support for Sink Input and decode phases Sliding Window #33035

liubo-intel · 2025-11-26T10:49:09Z

Details:

adds support for sink input parameter and sliding window attention mechanism in PagedAttention implementation with related test coverage.
Key Changes
Sink Input Support: Extended PagedAttentionExtension to accept optional sink input as the 21st parameter, updated executor to handle sink tokens in attention computation
Sliding Window Attention: Implemented sliding window mechanism for both prefill and decode phases
Test Coverage: Added comprehensive test suite comparing PA vs SDPA across multiple configurations (f32/bf16, with/without sink, normal/sliding window)

Tickets:

CVS-173535 CVS-176456

Copilot

Pull request overview

This PR adds comprehensive support for sink input parameters and sliding window attention mechanisms in the PagedAttention implementation. The changes enable the system to handle attention patterns with constrained context windows and special sink tokens that remain accessible throughout the attention computation.

Key changes:

Extended PagedAttentionExtension to accept sink input as the 21st parameter, making it a required input
Implemented sliding window attention for both prefill and decode phases with proper masking logic
Added test coverage comparing PagedAttention vs SDPA across multiple configurations (f32/bf16, with/without sink, normal/sliding window)

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File	Description
src/plugins/intel_cpu/tests/functional/custom/subgraph_tests/src/x64/paged_attn.cpp	Added comprehensive test infrastructure for sink input and sliding window attention, including SDPA reference model updates and mask generation logic
src/plugins/intel_cpu/src/nodes/paged_attn.cpp	Removed validation blocking sink input support and corrected precision type for sink tensor descriptor
src/plugins/intel_cpu/src/nodes/kernels/scaled_attn/executor_pa.cpp	Integrated sink input handling and sliding window logic into attention kernel execution paths

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

src/plugins/intel_cpu/tests/functional/custom/subgraph_tests/src/x64/paged_attn.cpp

zhangYiIntel · 2025-11-27T03:47:23Z

src/plugins/intel_cpu/src/nodes/paged_attn.cpp

                               kCachePrecision.to_string() + " value cache prec " + vCachePrecision.to_string();
                return false;
            }
        }


If ARM platform doesn't support sink input, maybe we still need a check here

zhangYiIntel · 2025-11-27T05:59:54Z

src/plugins/intel_cpu/src/nodes/kernels/scaled_attn/executor_pa.cpp

-                              const ScoreAggregationInfo* score_info_ptr) {
+                              const ScoreAggregationInfo* score_info_ptr,
+                              size_t batch_in_seq,
+                              const PlainTensor& sinks) {


No need to add parameter batch_in_seq, in init phase, code ensures that sink has shape {1, H, 1, 1}

openvino/src/plugins/intel_cpu/src/nodes/kernels/scaled_attn/executor_pa.cpp

Line 2067 in d7eb258

sinks.assert_dims({1, H, 1, 1});

Hi, @zhangYiIntel : considering the sinks batch dimension, agree that shape {1,H,1,1}(and the following address sinks.ptr<float>(0, h, 0, 0); ) should functionally work fine. but from the spec it seems only the last dimension is fixed to 1. how about we leave current dynamic dimension support incase batch separate sinks needed in future?

@maxnick Hi Maksim what's your opinion, why do we check if sink has shape 1, H ,1, 1 ?

The sink input in GPT oss is a const of shape {1, H, 1, 1}. And this is not a coincidence, because B and H are in fact dynamic dimensions, and we can't have them in a constant tensor. Thus we broadcast the 0th and 2nd axis of this sink constant. Therefore, as long as the sink input is expected to be constant, it's logical to assert {1, H, 1, 1} sinks tensor shape. The real question is - are we going to support a variable sink input? If not, then we leave this shape assert as it is, but also it does make sense to add a sanity check testing if this input is indeed constant.

zhangYiIntel · 2025-11-27T06:00:33Z

src/plugins/intel_cpu/src/nodes/kernels/scaled_attn/executor_pa.cpp

                            const PlainTensor& alibi_slopes,
-                            float* score_output) {
+                            float* score_output,
+                            size_t batch_in_seq,


save with above

zhangYiIntel · 2025-11-27T06:02:13Z

src/plugins/intel_cpu/src/nodes/kernels/scaled_attn/executor_pa.cpp

                        ? _d_scale * reinterpret_cast<float*>(_quantized_q.ptr<int8_t>(ithr, m - q_start, 0))[0]
                        : _d_scale;
+
+                // sink processing is independent of sliding_window size


Since sink has shape 1, H, 1, 1, we could simplify the code to

sink = sinks.ptr<float>(0, h, 0, 0);

zhangYiIntel · 2025-11-27T06:02:39Z

src/plugins/intel_cpu/src/nodes/kernels/scaled_attn/executor_pa.cpp

-                                           nullptr,
-                                           alibi_slope);
+                float* sink = nullptr;
+                if (sinks) {


Since sink has shape 1, H, 1, 1, we could simplify the code to

sink = sinks.ptr<float>(0, h, 0, 0);

zhangYiIntel · 2025-11-27T06:03:36Z

src/plugins/intel_cpu/src/nodes/kernels/scaled_attn/executor_pa.cpp

-                                       nullptr,
-                                       alibi_slope);
+            float* sink = nullptr;
+            if (sinks) {


Since sink has shape 1, H, 1, 1, we could simplify the code to

sink = sinks.ptr<float>(0, h, 0, 0);

zhangYiIntel · 2025-11-27T06:04:22Z

src/plugins/intel_cpu/src/nodes/kernels/scaled_attn/executor_pa.cpp

                    alibi_slopes,
-                    score_output);
+                    score_output,
+                    batch_in_seq,


Remove batch_in_seq

zhangYiIntel · 2025-11-27T06:04:39Z

src/plugins/intel_cpu/src/nodes/kernels/scaled_attn/executor_pa.cpp

                        q_start_idx_score,
-                        score_info_ptr);
+                        score_info_ptr,
+                        0,


Remove batch_in_seq

zhangYiIntel · 2025-11-27T06:04:45Z

src/plugins/intel_cpu/src/nodes/kernels/scaled_attn/executor_pa.cpp

                    q_start_idx_score,
-                    score_info_ptr);
+                    score_info_ptr,
+                    batch_in_seq,


zhangYiIntel · 2025-11-27T06:07:45Z

src/plugins/intel_cpu/tests/functional/custom/subgraph_tests/src/x64/paged_attn.cpp

                                            ::testing::ValuesIn(additional_configs)),
                         PagedAttnTestBase::getTestCaseName);
+
+// Sliding window test with same shapes as normal test


The only difference between smoke_PagedAttnVSSDPATest and smoke_PagedAttnVSSDPATest_WithSlidingWindow is the slice_windows, why not combine them like

INSTANTIATE_TEST_SUITE_P(smoke_PagedAttnVSSDPATest, PagedAttnVSSDPATest, ::testing::Combine(::testing::Values(ElementType::f32, ElementType::bf16), ::testing::ValuesIn(inputShapeAndReorders), ::testing::Values(true, false), ::testing::Values(true, false), ::testing::Values(0, 8), // sliding_window = 0, 8 ::testing::ValuesIn(additional_configs)), PagedAttnTestBase::getTestCaseName);

currently we used sperate smoke_PagedAttnVSSDPATest_WithSlidingWindow test case, because if combine it to PagedAttnVSSDPATest we need to generate the mask separately for fp32, bf16,(fp16). what's more the number of this test suit will increase from 25 to 40.
since PA SlidingWindow logic is some kind of sperate feature from input data type and other test parameters, how about we use one single separate test case to cover this logic and keep the test suit light?

zhangYiIntel

Generally looks good, please fix the comments

liubo-intel requested review from a team as code owners November 26, 2025 10:49

github-actions bot added the category: CPU OpenVINO CPU plugin label Nov 26, 2025

liubo-intel force-pushed the liubo/support_PA_with_sink_input branch from e25dc47 to f42fb29 Compare November 26, 2025 11:16

liubo-intel requested a review from Copilot November 27, 2025 01:25

Copilot AI reviewed Nov 27, 2025

View reviewed changes

yuxu42 requested a review from zhangYiIntel November 27, 2025 03:01

zhangYiIntel requested changes Nov 27, 2025

View reviewed changes

liubo-intel added 6 commits November 28, 2025 00:51

Support PA with additional sink input

721395b

fix PA sliding_window issue: add sliding_window process for second token

1b1b1bf

Add PA sliding_window testcase

fec456c

fix CI issue

79026d7

Apply suggestions from code review

7924b94

fix rebase confilct

0a391b2

liubo-intel force-pushed the liubo/support_PA_with_sink_input branch from 7407cb0 to 0a391b2 Compare November 30, 2025 03:01

[CPU] Add PagedAttention Support for Sink Input and decode phases Sliding Window #33035

Are you sure you want to change the base?

[CPU] Add PagedAttention Support for Sink Input and decode phases Sliding Window #33035

Conversation

liubo-intel commented Nov 26, 2025

Details:

Tickets:

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

zhangYiIntel left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

zhangYiIntel left a comment •

edited

Loading