[CK_TILE] Conv bwd splitN support #3047

johannes-graner · 2025-10-17T10:42:58Z

Proposed changes

Implements SplitN support for backwards convolution (data), similar to what #2776 did for forward convolution.

Test results (with element-size-limit reduced to 16MB make CPU reference comparison feasible):

Dimensions	Group size	C	K	N	grid	passed verification
1	2	32	32	2048	{8192, 2, 1}	Y
1	2	32	32	4096	{8192, 2, 2}	Y
1	4	32	32	4096	{8192, 4, 2}	Y
2	2	32	32	32	{8192, 2, 1}	Y
2	2	32	32	64	{8192, 2, 2}	Y
2	2	64	32	32	{4096, 2, 2}	Y
3	2	8	8	2	{32768, 2, 1}	Y
3	1	8	8	4	{32768, 1, 2}	Y

The above results show that the implementation is correct for

Cases when SplitN is not needed
C != K
Varying group sizes

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

I have added tests relevant to the introduced functionality, and the unit tests are passing locally
I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run.
I have added inline documentation which enables the maintainers with understanding the motivation
I have removed the stale documentation which is no longer relevant after this pull request
(If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
I have run clang-format on all changed files
Any dependent changes have been merged

Discussion

If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered

include/ck_tile/ops/grouped_convolution/kernel/grouped_convolution_backward_data_kernel.hpp

include/ck_tile/ops/grouped_convolution/utils/transform_conv_bwd_data_to_gemm.hpp

Copilot

Pull Request Overview

This PR implements Split-N support for the backward convolution (data) kernel, enabling the batch dimension to be split across multiple GPU blocks when tensor sizes exceed 2GB. This parallels the Split-N implementation previously added for forward convolution in PR #2776.

Key changes:

Enables Split-N computation by distributing batch dimension across blockIdx.z
Adds logic to calculate optimal N splits based on tensor memory footprint
Adds safeguards to prevent simultaneous use of Split-K and Split-N (both use blockIdx.z)

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File	Description
`include/ck_tile/ops/grouped_convolution/utils/transform_conv_bwd_data_to_gemm.hpp`	Implements `GetSplitedNSize()` to calculate optimal batch splits and adds tracking of original vs. split batch sizes
`include/ck_tile/ops/grouped_convolution/kernel/grouped_convolution_backward_data_kernel.hpp`	Updates kernel to use Split-N grid dimensions, applies batch offsets to input/output pointers, and adds Split-K/Split-N conflict detection

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

include/ck_tile/ops/grouped_convolution/utils/transform_conv_bwd_data_to_gemm.hpp

Copilot

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-10-17T13:11:27Z

include/ck_tile/ops/grouped_convolution/utils/transform_conv_bwd_data_to_gemm.hpp

+        for(index_t i = num_dims - 2; i >= 1; i--)
+        {
+            a_g_n_c_wis_strides[i] = a_g_n_c_wis_strides[i + 1] * a_g_n_c_wis_lengths[i + 1];
+        }


Loop condition 'i >= 1' with signed index type can cause undefined behavior when 'i' decrements from 0. Since 'index_t' is typically a signed type, when 'i' reaches 0 and decrements, it becomes -1, which still satisfies 'i >= 1' is false, but the comparison may cause issues. Consider using 'i > 0' or switching to an unsigned type with explicit bounds checking.

Copilot · 2025-10-17T13:11:28Z

include/ck_tile/ops/grouped_convolution/utils/transform_conv_bwd_data_to_gemm.hpp

+        for(index_t i = num_dims - 2; i >= 1; i--)
+        {
+            c_g_n_k_wos_strides[i] = c_g_n_k_wos_strides[i + 1] * c_g_n_k_wos_lengths[i + 1];
+        }


Same loop condition issue as with the input stride calculation. The condition 'i >= 1' with a signed index can lead to undefined behavior. Use 'i > 0' or ensure proper type handling.

Copilot · 2025-10-17T13:11:28Z

include/ck_tile/ops/grouped_convolution/kernel/grouped_convolution_backward_data_kernel.hpp

+                                                static_cast<long_index_t>(kargs.input_batch_stride);
+
+        // SplitK
+        // TODO: Implement SplitK support


The TODO comment mentions SplitK support but doesn't explain its relationship to the current SplitN implementation or why it's commented out. Consider adding context about whether SplitK conflicts with SplitN, or if it's a future enhancement that works independently.

Suggested change

// TODO: Implement SplitK support

// TODO: Implement SplitK support.

// Note: SplitK and SplitN are mutually exclusive in the current implementation.

// Only SplitN is supported at this time; SplitK is commented out because

// enabling both simultaneously would require additional logic to handle

// their interaction. SplitK support is planned as a future enhancement.

* Conv bwd splitN support * Adjust splitting calculations to lengths format * Prepare indexing for future splitK support

Conv bwd splitN support

17543ba

johannes-graner requested review from ThomasNing, afagaj, andriy-ca, aosewski, aska-0096, asleepzzz, bartekxk, carlushuang, cgmillette, coderfeli, geyyer, illsilin, poyenc, qianfengz, shumway, tenpercent and vidyasagar-amd as code owners October 17, 2025 10:43

bartekxk reviewed Oct 17, 2025

View reviewed changes

bartekxk requested a review from Copilot October 17, 2025 11:07

Copilot AI reviewed Oct 17, 2025

View reviewed changes

include/ck_tile/ops/grouped_convolution/utils/transform_conv_bwd_data_to_gemm.hpp Outdated Show resolved Hide resolved

include/ck_tile/ops/grouped_convolution/utils/transform_conv_bwd_data_to_gemm.hpp Outdated Show resolved Hide resolved

johannes-graner added 2 commits October 17, 2025 12:36

Adjust splitting calculations to lengths format

e94ac64

Prepare indexing for future splitK support

58168c2

johannes-graner requested review from bartekxk and Copilot October 17, 2025 13:10

Copilot AI reviewed Oct 17, 2025

View reviewed changes

Merge branch 'develop' into jograner/ck_tile/conv_bwd_data_splitN

9f6bd80

bartekxk approved these changes Oct 22, 2025

View reviewed changes

bartekxk merged commit cbd1279 into develop Oct 22, 2025
48 checks passed

bartekxk deleted the jograner/ck_tile/conv_bwd_data_splitN branch October 22, 2025 11:34

ecamartins pushed a commit that referenced this pull request Oct 24, 2025

[CK_TILE] Conv bwd splitN support (#3047)

4c9f591

* Conv bwd splitN support * Adjust splitting calculations to lengths format * Prepare indexing for future splitK support

Jeff-Huang pushed a commit that referenced this pull request Oct 25, 2025

[CK_TILE] Conv bwd splitN support (#3047)

6538b32

* Conv bwd splitN support * Adjust splitting calculations to lengths format * Prepare indexing for future splitK support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CK_TILE] Conv bwd splitN support #3047

[CK_TILE] Conv bwd splitN support #3047

johannes-graner commented Oct 17, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Oct 17, 2025

Uh oh!

Copilot AI Oct 17, 2025

Uh oh!

Copilot AI Oct 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

-        // TODO: Implement SplitK support
+        // TODO: Implement SplitK support.
+        // Note: SplitK and SplitN are mutually exclusive in the current implementation.
+        //       Only SplitN is supported at this time; SplitK is commented out because
+        //       enabling both simultaneously would require additional logic to handle
+        //       their interaction. SplitK support is planned as a future enhancement.

[CK_TILE] Conv bwd splitN support #3047

[CK_TILE] Conv bwd splitN support #3047

Conversation

johannes-graner commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed changes

Checklist

Discussion

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 17, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

johannes-graner commented Oct 17, 2025 •

edited

Loading