[CPU] Conv-DWConv-PRelu fusing fix#33917
[CPU] Conv-DWConv-PRelu fusing fix#33917nshchego wants to merge 1 commit intoopenvinotoolkit:masterfrom
Conversation
a48b5df to
1a6e1f1
Compare
1a6e1f1 to
8f28688
Compare
|
@EgorDuplensky , could you please review? |
There was a problem hiding this comment.
Pull request overview
This PR fixes a bug in the Conv-DWConv-PRelu fusing optimization for AVX2 platforms and improves the associated test coverage. The fix addresses a mismatch where the oneDNN kernel skipped DW Convolution post-ops during initialization but attempted to add them during code generation. Additionally, the test suite was not properly verifying the fusing behavior due to an unsatisfied cache size condition.
Changes:
- Updated oneDNN submodule to include the kernel initialization fix
- Replaced
makeNgraphFunctionwithcreate_ov_modelacross all test files for naming consistency - Moved Conv-DWConv test from common to x64-specific directory with proper fusing verification
- Added platform guards to disable Conv-DWConv fusing on non-x86 platforms
Reviewed changes
Copilot reviewed 92 out of 92 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| src/plugins/intel_cpu/thirdparty/onednn | Updated submodule commit to include the DW convolution post-op fix |
| src/plugins/intel_cpu/tests/functional/utils/cpu_test_utils.hpp | Renamed function from makeNgraphFunction to create_ov_model |
| src/plugins/intel_cpu/tests/functional/utils/cpu_test_utils.cpp | Renamed function implementation to match header |
| src/plugins/intel_cpu/tests/functional/custom/subgraph_tests/src/x64/conv_dw_conv.cpp | New x64-specific test with proper fusing verification using CpuTestWithFusing |
| src/plugins/intel_cpu/tests/functional/custom/subgraph_tests/src/common/conv_dw_conv.cpp | Removed generic test that didn't verify fusing correctly |
| src/plugins/intel_cpu/src/graph_optimizer.cpp | Added platform guards and reorganized AVX2/AVX512 checks for Conv-DWConv fusing |
| (multiple test files) | Updated function calls from makeNgraphFunction to create_ov_model |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ov::CoordinateDiff{1, 1}, | ||
| std::vector<size_t>{1, 1}, | ||
| ov::op::PadType::EXPLICIT); | ||
| auto bias_const = utils::make_constant(precision, {1, out_channels , 1, 1}); |
There was a problem hiding this comment.
Extra whitespace before comma in shape specification. Should be {1, out_channels, 1, 1} for consistent formatting.
| auto bias_const = utils::make_constant(precision, {1, out_channels , 1, 1}); | |
| auto bias_const = utils::make_constant(precision, {1, out_channels, 1, 1}); |
| } | ||
|
|
||
| void GraphOptimizer::FuseConvolutionAndDWConvolution(Graph& graph) { | ||
| #if defined(OPENVINO_ARCH_X86) || defined(OPENVINO_ARCH_X86_64) |
There was a problem hiding this comment.
It seems we better to refactor graph optimizer a bit to avoid ifdefs in such cases. There is also an ARM pr which changes the order of some optimizations based on arch. We could 'register' optimizations like it is done in core transformation pipeline.
Not asking to implement it in scope of this pr of course
| #if defined(OPENVINO_ARCH_X86) || defined(OPENVINO_ARCH_X86_64) | ||
| // There is no optimized implementation for avx512, so two avx512 convolutions | ||
| // are expected to be faster than single fused avx2 convolution | ||
| if (!impl::cpu::x64::mayiuse(impl::cpu::x64::avx2) || impl::cpu::x64::mayiuse(impl::cpu::x64::avx512_core)) { |
There was a problem hiding this comment.
Can we use utility implication(avx2, !avx512)?
|
|
||
| auto eltwise = utils::make_eltwise(parameters[0], secondaryInput, eltwiseType); | ||
| function = makeNgraphFunction(netType, parameters, eltwise, "Eltwise"); | ||
| function = create_ov_model(netType, parameters, eltwise, "Eltwise"); |
There was a problem hiding this comment.
Don't change, but It would be better to make re-factor as separate PR/
Details:
jit_avx2_1x1_conv_kernel_f32_oldskips DW Convolution post-op on initialization step, but tries to add it on the code generation step. This fix aligns behavior.FuseConvolutionAndDWConvolution:(dw_conv_input_size + dw_conv_output_size > L3_cache_size / 2)that is not satisfied in test, but test is green. Fix is adding fusing check viaCpuTestWithFusingclass.FuseConvolutionAndDWConvolutionis applicable for AVX2 only, thus this code was disabled for non-X86 platforms to reduce binary size.Related OneDNN PR: openvinotoolkit/oneDNN#296
Tickets: