ARM KleidiAI Micro-Kernel: Depthwise Convolution (3x3 kernel, Stride of 1) #26654

damdoo01-arm · 2025-11-25T15:49:20Z

Description

This PR introduces a dedicated kernel to perform Depthwise Convolution for specific cases (3x3 kernel, stride of 1).

Motivation and Context

We have identified opportunities for uplift for this layer type and several customer models consist of layers which would benefit from this kernel being included.

…hannels

hariharans29 · 2025-12-01T04:24:07Z

/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline

azure-pipelines · 2025-12-01T04:24:30Z

Azure Pipelines successfully started running 4 pipeline(s).

hariharans29 · 2025-12-04T15:10:08Z

Just FYI - I am trying to add the same for NEON in #26688

hariharans29 · 2025-12-04T15:13:40Z

onnxruntime/core/mlas/lib/convolve.cpp

    const MLAS_CONV_ALGORITHM Algorithm = Parameters->Algorithm;

+#if defined(USE_KLEIDIAI) && !defined(_MSC_VER)
+    if (Algorithm == MlasConvAlgorithmExpandThenGemmSegmented &&


Maybe we can just use MlasConvAlgorithmDepthwise ? Please see #26688 ?

damdoo01-arm added 4 commits November 25, 2025 15:44

Initial Commit on the new branch

a9e69bd

Added nchw/nhwc conversion to pass mlas_test

f3731fb

Expanded to cover cases where padding=1

00b8ce2

Added test and other compatibility fixes

30e73a1

damdoo01-arm marked this pull request as draft November 25, 2025 15:49

damdoo01-arm changed the title ~~Depthwise conv 3x3 s1 updated~~ ARM KleidiAI Micro-Kernel: Depthwise conv 3x3 s1 Nov 25, 2025

damdoo01-arm changed the title ~~ARM KleidiAI Micro-Kernel: Depthwise conv 3x3 s1~~ ARM KleidiAI Micro-Kernel: Depthwise Convolution (3x3 kernel, Stride of 1) Nov 25, 2025

damdoo01-arm added 2 commits November 26, 2025 17:40

Updated to include 2d conv gate to prevent invoking IGEMM for small c…

e0b1903

…hannels

Removed erroneously added files

50eb586

damdoo01-arm marked this pull request as ready for review November 26, 2025 17:50

hariharans29 mentioned this pull request Dec 4, 2025

[MLAS/NEON] Add dedicated kernel for depthwise convolution for ARM64 using NEON intrinsics #26688

Open

hariharans29 reviewed Dec 4, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ARM KleidiAI Micro-Kernel: Depthwise Convolution (3x3 kernel, Stride of 1) #26654

ARM KleidiAI Micro-Kernel: Depthwise Convolution (3x3 kernel, Stride of 1) #26654

Uh oh!

damdoo01-arm commented Nov 25, 2025 •

edited

Loading

Uh oh!

hariharans29 commented Dec 1, 2025

Uh oh!

azure-pipelines bot commented Dec 1, 2025

Uh oh!

hariharans29 commented Dec 4, 2025

Uh oh!

hariharans29 Dec 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ARM KleidiAI Micro-Kernel: Depthwise Convolution (3x3 kernel, Stride of 1) #26654

Are you sure you want to change the base?

ARM KleidiAI Micro-Kernel: Depthwise Convolution (3x3 kernel, Stride of 1) #26654

Uh oh!

Conversation

damdoo01-arm commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Motivation and Context

Uh oh!

hariharans29 commented Dec 1, 2025

Uh oh!

azure-pipelines bot commented Dec 1, 2025

Uh oh!

hariharans29 commented Dec 4, 2025

Uh oh!

hariharans29 Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

damdoo01-arm commented Nov 25, 2025 •

edited

Loading