fix: prevent int32 overflow in k-grouped GEMM size calculations #226

SolenoidWGT · 2025-11-04T06:25:09Z

Fix Int32 Overflow: Prevent overflow in k-grouped GEMM size validation

Description

Fix integer overflow in k-grouped GEMM when validating tensor sizes with large dimensions.

While training the Deepseek-v3 model using DeepGemm on an H100 machine, we encountered an error in the group GEMM kernel when using long sequences ($>256\text{k}$). Upon investigation, we found the cause was an overflow during the calculation of the product of sum_k and hidden_size.

Root cause: When m or n are large and multiplied with sum_k, the result exceeds int32 max value (2,147,483,647), causing incorrect validation or undefined behavior.

Solution: Cast m and n to uint64_t before multiplication to safely handle large matrix dimensions.

Changes

csrc/apis/gemm.hpp:289-290: Cast to uint64_t in size assertions

// Before
DG_HOST_ASSERT(sum_mk == m * sum_k);
DG_HOST_ASSERT(sum_nk == n * sum_k);

// After
DG_HOST_ASSERT(sum_mk == static_cast<uint64_t>(m) * sum_k);
DG_HOST_ASSERT(sum_nk == static_cast<uint64_t>(n) * sum_k);

fix: prevent int32 overflow in k-grouped GEMM size calculations

62a3122

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: prevent int32 overflow in k-grouped GEMM size calculations #226

fix: prevent int32 overflow in k-grouped GEMM size calculations #226

Uh oh!

SolenoidWGT commented Nov 4, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fix: prevent int32 overflow in k-grouped GEMM size calculations #226

Are you sure you want to change the base?

fix: prevent int32 overflow in k-grouped GEMM size calculations #226

Uh oh!

Conversation

SolenoidWGT commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Fix Int32 Overflow: Prevent overflow in k-grouped GEMM size validation

Description

Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

SolenoidWGT commented Nov 4, 2025 •

edited

Loading