Skip to content

Conversation

@johannes-graner
Copy link
Contributor

@johannes-graner johannes-graner commented Oct 17, 2025

Proposed changes

Implements SplitN support for backwards convolution (data), similar to what #2776 did for forward convolution.

Test results (with element-size-limit reduced to 16MB make CPU reference comparison feasible):

Dimensions Group size C K N grid passed verification
1 2 32 32 2048 {8192, 2, 1} Y
1 2 32 32 4096 {8192, 2, 2} Y
1 4 32 32 4096 {8192, 4, 2} Y
2 2 32 32 32 {8192, 2, 1} Y
2 2 32 32 64 {8192, 2, 2} Y
2 2 64 32 32 {4096, 2, 2} Y
3 2 8 8 2 {32768, 2, 1} Y
3 1 8 8 4 {32768, 1, 2} Y

The above results show that the implementation is correct for

  1. Cases when SplitN is not needed
  2. C != K
  3. Varying group sizes

Checklist

Please put an x into the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.

  • I have added tests relevant to the introduced functionality, and the unit tests are passing locally
  • I have added the test to REGRESSION_TESTS list defined at the top of CMakeLists.txt in tests/CMakeLists.txt, IF the test takes more than 30 seconds to run.
  • I have added inline documentation which enables the maintainers with understanding the motivation
  • I have removed the stale documentation which is no longer relevant after this pull request
  • (If this change is user-facing) I have added release notes which provide the end users with a brief summary of the improvement from this pull request
  • I have run clang-format on all changed files
  • Any dependent changes have been merged

Discussion

If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered

@bartekxk bartekxk requested a review from Copilot October 17, 2025 11:07
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements Split-N support for the backward convolution (data) kernel, enabling the batch dimension to be split across multiple GPU blocks when tensor sizes exceed 2GB. This parallels the Split-N implementation previously added for forward convolution in PR #2776.

Key changes:

  • Enables Split-N computation by distributing batch dimension across blockIdx.z
  • Adds logic to calculate optimal N splits based on tensor memory footprint
  • Adds safeguards to prevent simultaneous use of Split-K and Split-N (both use blockIdx.z)

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
include/ck_tile/ops/grouped_convolution/utils/transform_conv_bwd_data_to_gemm.hpp Implements GetSplitedNSize() to calculate optimal batch splits and adds tracking of original vs. split batch sizes
include/ck_tile/ops/grouped_convolution/kernel/grouped_convolution_backward_data_kernel.hpp Updates kernel to use Split-N grid dimensions, applies batch offsets to input/output pointers, and adds Split-K/Split-N conflict detection

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +58 to +61
for(index_t i = num_dims - 2; i >= 1; i--)
{
a_g_n_c_wis_strides[i] = a_g_n_c_wis_strides[i + 1] * a_g_n_c_wis_lengths[i + 1];
}
Copy link

Copilot AI Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Loop condition 'i >= 1' with signed index type can cause undefined behavior when 'i' decrements from 0. Since 'index_t' is typically a signed type, when 'i' reaches 0 and decrements, it becomes -1, which still satisfies 'i >= 1' is false, but the comparison may cause issues. Consider using 'i > 0' or switching to an unsigned type with explicit bounds checking.

Copilot uses AI. Check for mistakes.
Comment on lines +66 to +69
for(index_t i = num_dims - 2; i >= 1; i--)
{
c_g_n_k_wos_strides[i] = c_g_n_k_wos_strides[i + 1] * c_g_n_k_wos_lengths[i + 1];
}
Copy link

Copilot AI Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same loop condition issue as with the input stride calculation. The condition 'i >= 1' with a signed index can lead to undefined behavior. Use 'i > 0' or ensure proper type handling.

Copilot uses AI. Check for mistakes.
static_cast<long_index_t>(kargs.input_batch_stride);

// SplitK
// TODO: Implement SplitK support
Copy link

Copilot AI Oct 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TODO comment mentions SplitK support but doesn't explain its relationship to the current SplitN implementation or why it's commented out. Consider adding context about whether SplitK conflicts with SplitN, or if it's a future enhancement that works independently.

Suggested change
// TODO: Implement SplitK support
// TODO: Implement SplitK support.
// Note: SplitK and SplitN are mutually exclusive in the current implementation.
// Only SplitN is supported at this time; SplitK is commented out because
// enabling both simultaneously would require additional logic to handle
// their interaction. SplitK support is planned as a future enhancement.

Copilot uses AI. Check for mistakes.
@bartekxk bartekxk merged commit cbd1279 into develop Oct 22, 2025
48 checks passed
@bartekxk bartekxk deleted the jograner/ck_tile/conv_bwd_data_splitN branch October 22, 2025 11:34
ecamartins pushed a commit that referenced this pull request Oct 24, 2025
* Conv bwd splitN support

* Adjust splitting calculations to lengths format

* Prepare indexing for future splitK support
Jeff-Huang pushed a commit that referenced this pull request Oct 25, 2025
* Conv bwd splitN support

* Adjust splitting calculations to lengths format

* Prepare indexing for future splitK support
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants