Skip to content

Conversation

@SolenoidWGT
Copy link

@SolenoidWGT SolenoidWGT commented Nov 4, 2025

Fix Int32 Overflow: Prevent overflow in k-grouped GEMM size validation

Description

Fix integer overflow in k-grouped GEMM when validating tensor sizes with large dimensions.

While training the Deepseek-v3 model using DeepGemm on an H100 machine, we encountered an error in the group GEMM kernel when using long sequences ($>256\text{k}$). Upon investigation, we found the cause was an overflow during the calculation of the product of sum_k and hidden_size.

Root cause: When m or n are large and multiplied with sum_k, the result exceeds int32 max value (2,147,483,647), causing incorrect validation or undefined behavior.

Solution: Cast m and n to uint64_t before multiplication to safely handle large matrix dimensions.

Changes

csrc/apis/gemm.hpp:289-290: Cast to uint64_t in size assertions

// Before
DG_HOST_ASSERT(sum_mk == m * sum_k);
DG_HOST_ASSERT(sum_nk == n * sum_k);

// After
DG_HOST_ASSERT(sum_mk == static_cast<uint64_t>(m) * sum_k);
DG_HOST_ASSERT(sum_nk == static_cast<uint64_t>(n) * sum_k);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant