[CPU] Fuse SDPA and Concat as early as possible #28189

luo-cheng2021 · 2024-12-24T08:39:27Z

Details:

Move StatefulSDPAFusion before CommonOptimizations
...

Tickets:

158738

src/common/transformations/include/transformations/utils/gen_pattern.hpp

src/plugins/intel_cpu/src/transformations/transformation_pipeline.cpp

.../intel_cpu/tests/functional/custom/subgraph_tests/src/x64/fuse_reshape_transpose_to_sdpa.cpp

usstq

LGTM!

dmitry-gorokhov · 2025-01-03T09:17:58Z

src/plugins/intel_cpu/src/transformations/cpu_opset/common/pass/stateful_sdpa_fusion.cpp

+    CPU_REGISTER_PASS_COMMON(manager, ov::pass::transpose_sinking::TSShapeOfForward);
+    CPU_REGISTER_PASS_COMMON(manager, StatefulSDPAFusion);
+    // TODO: SDPAFuseTransposeReshape may cause regressions in icx.
+    // CPU_REGISTER_PASS_X64(manager, ov::intel_cpu::SDPAFuseTransposeReshape);


Any details on that?
I recall that pass was added for Whisper model. So how do we guarantee we don't bring regression to Whisper?

Currently no LLM models will hit SDPAFuseTransposeReshape including Whisper so there is no regression, checked on WW52 models. The pattern would work for customized model which should not upstream yet. From @maxnick's comment, there is a plan to use snippets to cover the pattern. So, should we leave the case to snippets or use this transformation to cover?

This customized pattern is expected to become default: huggingface/optimum-intel#1078. Export part has been waiting runtime optimizations actually and SDPAFuseTransposeReshape is part of these optimizations https://jira.devtools.intel.com/browse/CVS-153616.
Snippets are responsible for SDPA patterns w/o states (like regular transformers or diffusers). SDPA with states (LLMs. Whisper) should be processed via custom SDPA node as of now.

SDPAFuseTransposeReshape only works with SDPA when fusing to SDPA+concat failed(if success the node type will be ScaledDotProductAttentionWithKVCache):

openvino/src/plugins/intel_cpu/src/transformations/cpu_opset/x64/pass/sdpa_fuse_transpose_reshape.cpp

Lines 54 to 55 in 548786a

auto sdpa_node =

wrap_type<op::v13::ScaledDotProductAttention>({q_transpose_node, k_transpose_node, v_transpose_node});

I suppose huggingface/optimum-intel#1078 will change current Whisper model into stateful model, so, we may need to change SDPAFuseTransposeReshape to support ScaledDotProductAttentionWithKVCache or add the function into StatefulSDPAFusion. If my understanding is correct, I will create a ticket to track this.

@xipingyan Could you please clarify how SDPAFuseTransposeReshape worked for Whisper model? Have StatefulSDPAFusion falied for this model before?

hi @dmitry-gorokhov @luo-cheng2021 ,
Actually, Whisper model has 2 kinds of SDPA OPS. First one is mapped to ScaledDotProductAttentionWithKVCache, second one is mapped to ScaledDotProductAttention. In my case, SDPAFuseTransposeReshape only works for second one. So @luo-cheng2021 's suggestion "change SDPAFuseTransposeReshape to support ScaledDotProductAttentionWithKVCache" will not work.

@dmitry-gorokhov , I just aligned with @luo-cheng2021 , we need to double confirm whether current master has supported dynamic shape's snippet,

If yes, this pattern SDPAFuseTransposeReshape can be removed.
If no, maybe we can try to merge Reshape+Transpose before SDPA into init graph of readvalue, take it as temp solution, the snippet with dynamic shape still is our final target.

Disscussed offline.
Agreed to updated conditions for SDPAFuseTransposeReshape applicability: it should check that Reshape op goes after ReadValue. That will help to limit the pass implication on Whisper model only and avoif negative perf impact on SD and others.
Once Snippets will support such patterns with dynamic shapes, SDPAFuseTransposeReshape might be fully removed.

Done, no models in ww52 will hit SDPAFuseTransposeReshape, only customized Whisper will hit it.

.../intel_cpu/tests/functional/custom/subgraph_tests/src/x64/fuse_reshape_transpose_to_sdpa.cpp

### Details: - *Move StatefulSDPAFusion before CommonOptimizations* - *...* ### Tickets: - *[158738](https://jira.devtools.intel.com/browse/CVS-158738)*

github-actions bot added category: CPU OpenVINO CPU plugin category: transformations OpenVINO Runtime library - Transformations labels Dec 24, 2024

luo-cheng2021 force-pushed the luocheng/sdpa_decompose branch 2 times, most recently from 440527f to fb472cd Compare December 24, 2024 09:08

luo-cheng2021 added WIP work in progress do_not_review labels Dec 24, 2024

luo-cheng2021 marked this pull request as ready for review December 24, 2024 09:15

luo-cheng2021 requested review from a team as code owners December 24, 2024 09:15

luo-cheng2021 requested review from itikhono and removed request for a team December 24, 2024 09:15

luo-cheng2021 force-pushed the luocheng/sdpa_decompose branch 2 times, most recently from 1bc0547 to 108f726 Compare December 27, 2024 07:27

luo-cheng2021 added 11 commits January 2, 2025 02:16

add linux-perf

7a57f7a

move StatefulSDPAFusion before CommonOptimizations

f29f7d0

add env for test

9208d96

add dependent transformations

5323836

fix mixtral failure

e95bedc

code clean

bbc20a2

use least amount of transformations

6b2e86b

fix ci error

657d6ad

add SDPAFuseTransposeReshape back

2c7b964

modify test to cover the change

e285715

disable SDPAFuseTransposeReshape

02c2d19

luo-cheng2021 force-pushed the luocheng/sdpa_decompose branch from ef0f6d0 to 02c2d19 Compare January 2, 2025 01:16

luo-cheng2021 removed WIP work in progress do_not_review labels Jan 2, 2025

yuxu42 requested a review from usstq January 3, 2025 01:36

usstq reviewed Jan 3, 2025

View reviewed changes

apply review comments

1b93577

luo-cheng2021 requested a review from usstq January 3, 2025 04:35

usstq approved these changes Jan 3, 2025

View reviewed changes

dmitry-gorokhov reviewed Jan 3, 2025

View reviewed changes

.../intel_cpu/tests/functional/custom/subgraph_tests/src/x64/fuse_reshape_transpose_to_sdpa.cpp Outdated Show resolved Hide resolved

dmitry-gorokhov added this to the 2025.0 milestone Jan 3, 2025

enable SDPAFuseTransposeReshape with stateful

a8750e0

luo-cheng2021 requested review from xipingyan and dmitry-gorokhov January 6, 2025 09:04

dmitry-gorokhov approved these changes Jan 6, 2025

View reviewed changes

dmitry-gorokhov enabled auto-merge January 6, 2025 09:39

dmitry-gorokhov added this pull request to the merge queue Jan 6, 2025

Merged via the queue into openvinotoolkit:master with commit 552ba66 Jan 6, 2025
184 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CPU] Fuse SDPA and Concat as early as possible #28189

[CPU] Fuse SDPA and Concat as early as possible #28189

luo-cheng2021 commented Dec 24, 2024

usstq left a comment

dmitry-gorokhov Jan 3, 2025

luo-cheng2021 Jan 3, 2025 •

edited

Loading

dmitry-gorokhov Jan 3, 2025

luo-cheng2021 Jan 3, 2025

dmitry-gorokhov Jan 3, 2025

xipingyan Jan 6, 2025 •

edited

Loading

dmitry-gorokhov Jan 6, 2025

luo-cheng2021 Jan 6, 2025

	auto sdpa_node =
	wrap_type<op::v13::ScaledDotProductAttention>({q_transpose_node, k_transpose_node, v_transpose_node});

[CPU] Fuse SDPA and Concat as early as possible #28189

[CPU] Fuse SDPA and Concat as early as possible #28189

Conversation

luo-cheng2021 commented Dec 24, 2024

Details:

Tickets:

usstq left a comment

Choose a reason for hiding this comment

dmitry-gorokhov Jan 3, 2025

Choose a reason for hiding this comment

luo-cheng2021 Jan 3, 2025 • edited Loading

Choose a reason for hiding this comment

dmitry-gorokhov Jan 3, 2025

Choose a reason for hiding this comment

luo-cheng2021 Jan 3, 2025

Choose a reason for hiding this comment

dmitry-gorokhov Jan 3, 2025

Choose a reason for hiding this comment

xipingyan Jan 6, 2025 • edited Loading

Choose a reason for hiding this comment

dmitry-gorokhov Jan 6, 2025

Choose a reason for hiding this comment

luo-cheng2021 Jan 6, 2025

Choose a reason for hiding this comment

luo-cheng2021 Jan 3, 2025 •

edited

Loading

xipingyan Jan 6, 2025 •

edited

Loading