Skip to content

Conversation

@damdoo01-arm
Copy link
Contributor

@damdoo01-arm damdoo01-arm commented Nov 25, 2025

Description

This PR introduces a dedicated kernel to perform Depthwise Convolution for specific cases (3x3 kernel, stride of 1).

Motivation and Context

We have identified opportunities for uplift for this layer type and several customer models consist of layers which would benefit from this kernel being included.

image

@damdoo01-arm damdoo01-arm marked this pull request as draft November 25, 2025 15:49
@damdoo01-arm damdoo01-arm changed the title Depthwise conv 3x3 s1 updated ARM KleidiAI Micro-Kernel: Depthwise conv 3x3 s1 Nov 25, 2025
@damdoo01-arm damdoo01-arm changed the title ARM KleidiAI Micro-Kernel: Depthwise conv 3x3 s1 ARM KleidiAI Micro-Kernel: Depthwise Convolution (3x3 kernel, Stride of 1) Nov 25, 2025
@damdoo01-arm damdoo01-arm marked this pull request as ready for review November 26, 2025 17:50
@hariharans29
Copy link
Member

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 4 pipeline(s).

@hariharans29
Copy link
Member

Just FYI - I am trying to add the same for NEON in #26688

const MLAS_CONV_ALGORITHM Algorithm = Parameters->Algorithm;

#if defined(USE_KLEIDIAI) && !defined(_MSC_VER)
if (Algorithm == MlasConvAlgorithmExpandThenGemmSegmented &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can just use MlasConvAlgorithmDepthwise ? Please see #26688 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants