Skip to content

Change ov::parallel_for to CpuParallel::parallel_for#33244

Open
sunxiaoxia2022 wants to merge 24 commits intoopenvinotoolkit:masterfrom
sunxiaoxia2022:xiaoxia/parallel_for_auto_2.0
Open

Change ov::parallel_for to CpuParallel::parallel_for#33244
sunxiaoxia2022 wants to merge 24 commits intoopenvinotoolkit:masterfrom
sunxiaoxia2022:xiaoxia/parallel_for_auto_2.0

Conversation

@sunxiaoxia2022
Copy link
Contributor

@sunxiaoxia2022 sunxiaoxia2022 commented Dec 15, 2025

Details:

  • Changed partial ov::parallel_for to CpuParallel::parallel_for in which can set TBB partitioner to AUTO or STATIC

Tickets:

@sunxiaoxia2022 sunxiaoxia2022 requested review from a team as code owners December 15, 2025 01:35
@github-actions github-actions bot added the category: CPU OpenVINO CPU plugin label Dec 15, 2025
@maxnick maxnick requested a review from Copilot January 8, 2026 09:20
@maxnick maxnick added this to the 2026.0 milestone Jan 8, 2026
@maxnick maxnick self-assigned this Jan 8, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the Intel CPU plugin to use CpuParallel::parallel_for instead of ov::parallel_for throughout the codebase. This change enables more flexible control over TBB partitioning strategies (AUTO or STATIC) for parallel operations.

Key Changes:

  • Replaced ov::parallel_for calls with CpuParallel::parallel_for across numerous node implementations
  • Added cpuParallel member variables to various executor and node classes
  • Updated constructors and method signatures to accept and store CpuParallel instances
  • Extended CpuParallel class with parallel_sum2d and parallel_sum3d methods

Reviewed changes

Copilot reviewed 79 out of 79 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
cpu_parallel.hpp Added parallel_sum2d and parallel_sum3d methods to CpuParallel class
cpu_memory.cpp/h Updated split_vertical to accept cpuParallel parameter
tile.h/cpp Added cpuParallel member and updated optimizedExecute call
tensoriterator.h/cpp Added cpuParallel to DynamicBuffer constructor and copy method
stft.cpp Updated transpose_out4d and RDFTExecutor::build calls with cpuParallel
split.h/cpp Added cpuParallel to SplitOptimizedExecutor
space_to_depth.h/cpp Added cpuParallel to executor attributes and PermuteKernel
shuffle_channels.h/cpp Added cpuParallel to executor attributes and PermuteKernel
scaled_attn.cpp Updated parallel operations with cpuParallel throughout attention kernels
rope.cpp Added cpuParallel to all RoPE executor variants
roll.h/cpp Added cpuParallel to RollExecutor
roi_pooling.h/cpp Added cpuParallel to executor constructors and key
rms_norm.cpp Added cpuParallel to RMSNormExecutor
region_yolo.cpp Updated SoftmaxGeneric constructor with cpuParallel
rdft.h/cpp Added cpuParallel to RDFTExecutor and build method
proposal_imp.hpp/cpp Added cpuParallel parameter to proposal_exec and helper functions
proposal.cpp Updated proposal_exec call with cpuParallel
paged_attn.cpp Updated make_pa_executor call with cpuParallel
pad.h/cpp Added cpuParallel to attrs and used in padConstantCommon
normalize.h/cpp Added cpuParallel to executor constructors and key
mvn.cpp Added cpuParallel to MVNAttrs and updated parallel operations
mha_single_token.hpp/cpp Added cpuParallel parameter to mha_single_token function
executor_pa.hpp/cpp Added cpuParallel to AttentionExecutor constructors
attn_quant.hpp/cpp Added cpuParallel to quantization functions
attn_memcpy.hpp/cpp Added cpuParallel to memory copy functions
istft.cpp Added cpuParallel to istft_impl and transpose_out4d
interpolate.h/cpp Added cpuParallel to executor base class and implementations
generate_proposals.cpp Added cpuParallel to refine_anchors and helper functions
gathermatmul.cpp Updated parallel_for calls with cpuParallel
gather_tree.h/cpp Added cpuParallel to GatherTreeExecutor
fullyconnected.cpp Updated split_vertical calls with cpuParallel
extract_image_patches.h/cpp Added cpuParallel to executor constructors
experimental_detectron_roifeatureextractor.cpp Added cpuParallel to ROIAlignForward_cpu_kernel
experimental_detectron_generate_proposals_single_image.cpp Added cpuParallel to refine_anchors and helper functions
jit_transpose.cpp Updated PermuteKernel constructor with cpuParallel
eltwise.hpp/cpp Added cpuParallel to EltwiseRefExecutor and BitwiseRefExecutor
mvn.hpp Added cpuParallel to MVNAttrs structure
kleidiai_mm.hpp/cpp Added cpuParallel member and updated parallel operations
ref_opt_transpose.cpp Added cpuParallel to transpose functions
executor.hpp Added getCpuParallel method to ExecutorContext
dft.h/cpp Changed generateTwiddlesDFT to non-static and added cpuParallel usage
depth_to_space.h/cpp Added cpuParallel to executor attributes and PermuteKernel
def_conv.h/cpp Added cpuParallel to DefConvExecutor constructors
tile_broadcast_utils.h/cpp Added cpuParallel parameter to optimizedExecute
softmax.h/cpp Added cpuParallel to SoftmaxGeneric constructor
permute_kernel.h/cpp Added cpuParallel to PermuteKernel constructor
color_convert.h/cpp Added cpuParallel to Converter constructors
causal_mask_preprocess.cpp Added cpuParallel to ExecutorCausalMaskPreprocess
broadcast.h/cpp Added cpuParallel member and updated optimizedExecute call
Comments suppressed due to low confidence (2)

src/plugins/intel_cpu/src/nodes/color_convert.cpp:1

  • Corrected spelling of 'cpu_arallel' to 'cpu_parallel'.
// Copyright (C) 2018-2025 Intel Corporation

src/plugins/intel_cpu/src/nodes/executors/kleidiai/kleidiai_mm.cpp:1

  • Corrected spelling of 'cpu_arallel' to 'cpu_parallel'.
// Copyright (C) 2023 Intel Corporation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@maxnick maxnick modified the milestones: 2026.0, 2026.1 Jan 23, 2026
}

explicit MHAHelper(const ov::Extensions::Cpu::PagedAttnQuantParams& params) : _params(params) {
explicit MHAHelper(const std::shared_ptr<CpuParallel>& cpu_arallel) : _cpu_parallel(cpu_arallel) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
explicit MHAHelper(const std::shared_ptr<CpuParallel>& cpu_arallel) : _cpu_parallel(cpu_arallel) {
explicit MHAHelper(const std::shared_ptr<CpuParallel>& cpu_arallel) : _cpu_parallel(cpu_parallel) {

A typo


private:
std::unique_ptr<jit_uni_extract_image_patches_kernel> pKernel;
CpuParallelPtr cpuParallel = nullptr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need to store the pointer, when the parallel object is passed into the exec call?


private:
jit_extract_image_patches_params jpp;
CpuParallelPtr cpuParallel = nullptr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need to store the pointer, when the parallel object is passed into the exec call?

Comment on lines +461 to +462

using CpuParallelPtr = std::shared_ptr<ov::intel_cpu::CpuParallel>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likely it should be inside the ov::intel_cpu namespace.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: CPU OpenVINO CPU plugin

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants