Skip to content

With warp spec, allow more than 2 math wgs but disable reg sharing #4182

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 7, 2025

Conversation

jacobhinkle
Copy link
Collaborator

Previously if we enable warp specialization with 3 math groups, we will hit the removed exception. Now, we simply turn off register sharing unless there are 2 math groups. We should probably just adjust the number of registers manually or automatically based on number of math groups instead.

@jacobhinkle jacobhinkle requested a review from rdspring1 April 3, 2025 17:36
Copy link

github-actions bot commented Apr 3, 2025

Review updated until commit 40f8db3

Description

  • Allow more than 2 math warp groups with warp specialization

  • Disable register sharing unless there are exactly 2 math warp groups


Changes walkthrough 📝

Relevant files
Enhancement
hopper_multi_matmul.cpp
Modify register sharing logic for multiple math warp groups

csrc/scheduler/hopper_multi_matmul.cpp

  • Removed check for maximum 2 math warp groups
  • Modified condition to disable register sharing if not exactly 2 math
    warp groups
  • Added comment explaining the reason for disabling register sharing
    with more than 2 math warp groups
  • +5/-5     

    PR Reviewer Guide 🔍

    Here are some key observations to aid the review process:

    🧪 No relevant tests
    ⚡ Recommended focus areas for review

    Possible Issue

    The change disables register sharing for any number of math warp groups other than 2. This might lead to suboptimal performance if register sharing could be beneficial for more than 2 math warp groups.

    if (num_math_warp_groups != 2) {
      // Disable register sharing when there is only one math warp group.
      // In such case we will have 128 math threads and 128 dma threads,
      // for a total of 256 threads per CTA. The register file size on
      // Hopper is 64K registers, which is filled when a 256-thread CTA
      // has 256 registers per thread. Since 256 is already the maximum
      // number of registers per thread even with register sharing, there
      // is no point in doing register sharing to try and increase it.
      //
      // When there is more than one math warp group, we also disable
      // register sharing, since we don't currently compute the number of
      // register properly in that case.

    @jacobhinkle
    Copy link
    Collaborator Author

    !build

    @jacobhinkle
    Copy link
    Collaborator Author

    !build

    @jacobhinkle jacobhinkle merged commit 7a2aa39 into main Apr 7, 2025
    16 checks passed
    @jacobhinkle jacobhinkle deleted the jh/reg_sharing_multiple_warps branch April 7, 2025 13:44
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
    Labels
    None yet
    Projects
    None yet
    Development

    Successfully merging this pull request may close these issues.

    2 participants