Change ov::parallel_for to CpuParallel::parallel_for#33244
Change ov::parallel_for to CpuParallel::parallel_for#33244sunxiaoxia2022 wants to merge 24 commits intoopenvinotoolkit:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR refactors the Intel CPU plugin to use CpuParallel::parallel_for instead of ov::parallel_for throughout the codebase. This change enables more flexible control over TBB partitioning strategies (AUTO or STATIC) for parallel operations.
Key Changes:
- Replaced
ov::parallel_forcalls withCpuParallel::parallel_foracross numerous node implementations - Added
cpuParallelmember variables to various executor and node classes - Updated constructors and method signatures to accept and store
CpuParallelinstances - Extended
CpuParallelclass withparallel_sum2dandparallel_sum3dmethods
Reviewed changes
Copilot reviewed 79 out of 79 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| cpu_parallel.hpp | Added parallel_sum2d and parallel_sum3d methods to CpuParallel class |
| cpu_memory.cpp/h | Updated split_vertical to accept cpuParallel parameter |
| tile.h/cpp | Added cpuParallel member and updated optimizedExecute call |
| tensoriterator.h/cpp | Added cpuParallel to DynamicBuffer constructor and copy method |
| stft.cpp | Updated transpose_out4d and RDFTExecutor::build calls with cpuParallel |
| split.h/cpp | Added cpuParallel to SplitOptimizedExecutor |
| space_to_depth.h/cpp | Added cpuParallel to executor attributes and PermuteKernel |
| shuffle_channels.h/cpp | Added cpuParallel to executor attributes and PermuteKernel |
| scaled_attn.cpp | Updated parallel operations with cpuParallel throughout attention kernels |
| rope.cpp | Added cpuParallel to all RoPE executor variants |
| roll.h/cpp | Added cpuParallel to RollExecutor |
| roi_pooling.h/cpp | Added cpuParallel to executor constructors and key |
| rms_norm.cpp | Added cpuParallel to RMSNormExecutor |
| region_yolo.cpp | Updated SoftmaxGeneric constructor with cpuParallel |
| rdft.h/cpp | Added cpuParallel to RDFTExecutor and build method |
| proposal_imp.hpp/cpp | Added cpuParallel parameter to proposal_exec and helper functions |
| proposal.cpp | Updated proposal_exec call with cpuParallel |
| paged_attn.cpp | Updated make_pa_executor call with cpuParallel |
| pad.h/cpp | Added cpuParallel to attrs and used in padConstantCommon |
| normalize.h/cpp | Added cpuParallel to executor constructors and key |
| mvn.cpp | Added cpuParallel to MVNAttrs and updated parallel operations |
| mha_single_token.hpp/cpp | Added cpuParallel parameter to mha_single_token function |
| executor_pa.hpp/cpp | Added cpuParallel to AttentionExecutor constructors |
| attn_quant.hpp/cpp | Added cpuParallel to quantization functions |
| attn_memcpy.hpp/cpp | Added cpuParallel to memory copy functions |
| istft.cpp | Added cpuParallel to istft_impl and transpose_out4d |
| interpolate.h/cpp | Added cpuParallel to executor base class and implementations |
| generate_proposals.cpp | Added cpuParallel to refine_anchors and helper functions |
| gathermatmul.cpp | Updated parallel_for calls with cpuParallel |
| gather_tree.h/cpp | Added cpuParallel to GatherTreeExecutor |
| fullyconnected.cpp | Updated split_vertical calls with cpuParallel |
| extract_image_patches.h/cpp | Added cpuParallel to executor constructors |
| experimental_detectron_roifeatureextractor.cpp | Added cpuParallel to ROIAlignForward_cpu_kernel |
| experimental_detectron_generate_proposals_single_image.cpp | Added cpuParallel to refine_anchors and helper functions |
| jit_transpose.cpp | Updated PermuteKernel constructor with cpuParallel |
| eltwise.hpp/cpp | Added cpuParallel to EltwiseRefExecutor and BitwiseRefExecutor |
| mvn.hpp | Added cpuParallel to MVNAttrs structure |
| kleidiai_mm.hpp/cpp | Added cpuParallel member and updated parallel operations |
| ref_opt_transpose.cpp | Added cpuParallel to transpose functions |
| executor.hpp | Added getCpuParallel method to ExecutorContext |
| dft.h/cpp | Changed generateTwiddlesDFT to non-static and added cpuParallel usage |
| depth_to_space.h/cpp | Added cpuParallel to executor attributes and PermuteKernel |
| def_conv.h/cpp | Added cpuParallel to DefConvExecutor constructors |
| tile_broadcast_utils.h/cpp | Added cpuParallel parameter to optimizedExecute |
| softmax.h/cpp | Added cpuParallel to SoftmaxGeneric constructor |
| permute_kernel.h/cpp | Added cpuParallel to PermuteKernel constructor |
| color_convert.h/cpp | Added cpuParallel to Converter constructors |
| causal_mask_preprocess.cpp | Added cpuParallel to ExecutorCausalMaskPreprocess |
| broadcast.h/cpp | Added cpuParallel member and updated optimizedExecute call |
Comments suppressed due to low confidence (2)
src/plugins/intel_cpu/src/nodes/color_convert.cpp:1
- Corrected spelling of 'cpu_arallel' to 'cpu_parallel'.
// Copyright (C) 2018-2025 Intel Corporation
src/plugins/intel_cpu/src/nodes/executors/kleidiai/kleidiai_mm.cpp:1
- Corrected spelling of 'cpu_arallel' to 'cpu_parallel'.
// Copyright (C) 2023 Intel Corporation
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
src/plugins/intel_cpu/src/nodes/executors/kleidiai/kleidiai_mm.cpp
Outdated
Show resolved
Hide resolved
| } | ||
|
|
||
| explicit MHAHelper(const ov::Extensions::Cpu::PagedAttnQuantParams& params) : _params(params) { | ||
| explicit MHAHelper(const std::shared_ptr<CpuParallel>& cpu_arallel) : _cpu_parallel(cpu_arallel) { |
There was a problem hiding this comment.
| explicit MHAHelper(const std::shared_ptr<CpuParallel>& cpu_arallel) : _cpu_parallel(cpu_arallel) { | |
| explicit MHAHelper(const std::shared_ptr<CpuParallel>& cpu_arallel) : _cpu_parallel(cpu_parallel) { |
A typo
|
|
||
| private: | ||
| std::unique_ptr<jit_uni_extract_image_patches_kernel> pKernel; | ||
| CpuParallelPtr cpuParallel = nullptr; |
There was a problem hiding this comment.
Do we still need to store the pointer, when the parallel object is passed into the exec call?
|
|
||
| private: | ||
| jit_extract_image_patches_params jpp; | ||
| CpuParallelPtr cpuParallel = nullptr; |
There was a problem hiding this comment.
Do we still need to store the pointer, when the parallel object is passed into the exec call?
|
|
||
| using CpuParallelPtr = std::shared_ptr<ov::intel_cpu::CpuParallel>; |
There was a problem hiding this comment.
Likely it should be inside the ov::intel_cpu namespace.
Details:
Tickets: