Skip to content

Fix GPU Softmax NaN propagation for mixed infinite inputs#33498

Open
AyusKumarPathak wants to merge 8 commits intoopenvinotoolkit:masterfrom
AyusKumarPathak:fix-softmax-gpu-nan
Open

Fix GPU Softmax NaN propagation for mixed infinite inputs#33498
AyusKumarPathak wants to merge 8 commits intoopenvinotoolkit:masterfrom
AyusKumarPathak:fix-softmax-gpu-nan

Conversation

@AyusKumarPathak
Copy link

🐞 Bug Fix: Correct NaN Propagation in GPU Softmax with Mixed Infinite Inputs

This PR fixes a numerical stability bug in the GPU implementation of the Softmax operation where inputs containing a mix of inf and finite values produce incorrect results.

🔬 Problem Summary

For an input such as:

[inf, 1.0, 2.0]

The mathematically correct Softmax (IEEE-754 compliant) behavior is:

However, the current GPU kernel produced:

[nan, nan, nan]
while the CPU implementation already behaves correctly.

The root cause is that when max_value == inf, the kernel evaluates:

exp(inf - inf) → NaN

which contaminates the denominator and propagates NaNs to all output positions.


🛠️ Solution

The GPU Softmax kernel now explicitly detects the max_value == inf case and applies IEEE-754 semantics:

  • Positions equal to max_valueNaN
  • All other positions → 0.0

This avoids unstable exponential evaluation and ensures GPU output exactly matches CPU behavior.

The fix is minimal, local to the kernel, and does not affect performance for normal inputs.


📈 Impact

Case CPU GPU (Before) GPU (After)
[inf, 1, 2] [nan, 0, 0] [nan, nan, nan] [nan, 0, 0]
[inf, -inf, 1] [nan, 0, 0] [nan, nan, nan] [nan, 0, 0]
[-inf, 1, 2] [0, .2689, .7311] [0, .2689, .7311] unchanged

🧩 Files Modified

src/plugins/intel_gpu/src/kernel_selector/cl_kernels/softmax_gpu_ref.cl


🧪 Testing

Verified against the reproducer from issue #33456.
GPU output now matches CPU output for all reported edge cases.


🔗 Related Issue

Fixes #33456

@AyusKumarPathak AyusKumarPathak requested review from a team as code owners January 7, 2026 19:12
@github-actions github-actions bot added the category: GPU OpenVINO GPU plugin label Jan 7, 2026
@sys-openvino-ci sys-openvino-ci added the ExternalPR External contributor label Jan 7, 2026
@rkazants rkazants requested a review from Lyamin-Roman January 8, 2026 05:25
Comment on lines +109 to +133
// Handle IEEE-754 case when max_value is INF
if (isinf((float)max_value)) {
for (cls = 0; cls < class_num; ++cls) {
ACCUMULATOR_TYPE v = data[cls * TMP_CLASS_PITCH];
if (v == max_value)
data[cls * TMP_CLASS_PITCH] = (ACCUMULATOR_TYPE)NAN;
else
data[cls * TMP_CLASS_PITCH] = (ACCUMULATOR_TYPE)0.0f;
}

// Write results and exit
for (cls = 0; cls < class_num; ++cls) {
#if INPUT0_SIMPLE == 1
const uint output_idx = out_depth_offset + cls*OUTPUT_CLASS_PITCH;
#else
#if INPUT0_DIMS == 5
const uint output_idx = OUTPUT_GET_INDEX(b + *b_offset, f + *f_offset, z + *z_offset, y + *y_offset, x + *x_offset);
#else
const uint output_idx = OUTPUT_GET_INDEX(b + *b_offset, f + *f_offset, y + *y_offset, x + *x_offset);
#endif
#endif
output[output_idx] = data[cls * TMP_CLASS_PITCH];
}
return;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to apply fused ops for the issued case too.
Please check my suggestion below.

Suggested change
// Handle IEEE-754 case when max_value is INF
if (isinf((float)max_value)) {
for (cls = 0; cls < class_num; ++cls) {
ACCUMULATOR_TYPE v = data[cls * TMP_CLASS_PITCH];
if (v == max_value)
data[cls * TMP_CLASS_PITCH] = (ACCUMULATOR_TYPE)NAN;
else
data[cls * TMP_CLASS_PITCH] = (ACCUMULATOR_TYPE)0.0f;
}
// Write results and exit
for (cls = 0; cls < class_num; ++cls) {
#if INPUT0_SIMPLE == 1
const uint output_idx = out_depth_offset + cls*OUTPUT_CLASS_PITCH;
#else
#if INPUT0_DIMS == 5
const uint output_idx = OUTPUT_GET_INDEX(b + *b_offset, f + *f_offset, z + *z_offset, y + *y_offset, x + *x_offset);
#else
const uint output_idx = OUTPUT_GET_INDEX(b + *b_offset, f + *f_offset, y + *y_offset, x + *x_offset);
#endif
#endif
output[output_idx] = data[cls * TMP_CLASS_PITCH];
}
return;
}
for (cls = 0; cls < class_num; ++cls) {
// Handle IEEE-754 case when max_value is INF
if (isinf((float)max_value)) {
if (data[cls*TMP_CLASS_PITCH] == max_value)
data[cls*TMP_CLASS_PITCH] = TO_ACCUMULATOR_TYPE(NAN);
else
data[cls*TMP_CLASS_PITCH] = TO_ACCUMULATOR_TYPE(0.0f);
} else {
ACCUMULATOR_TYPE t = native_exp(data[cls*TMP_CLASS_PITCH] - max_value);
denominator += t;
data[cls*TMP_CLASS_PITCH] = t;
}
}
....
for (cls = 0; cls < class_num; ++cls) {
ACCUMULATOR_TYPE res = data[cls*TMP_CLASS_PITCH];
if (!isinf((float)max_value)) {
res = res / denominator;
}

@AyusKumarPathak
Copy link
Author

AyusKumarPathak commented Jan 8, 2026

Thanks for the suggestion but I’ve already reworked with the INF handling so it’s processed inside the main computation loops. This ensures fused ops and activation are now applied consistently for both INF and non-INF paths.

I also verified the original issue cases and the mixed-INF edge cases against the CPU implementation.
Please let me know if you’d like any additional scenarios validated.

Copy link
Contributor

@e-ddykim e-ddykim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, could you please add unit tests for the issue case?

@AyusKumarPathak AyusKumarPathak requested review from a team as code owners January 8, 2026 16:06
@github-actions github-actions bot added the category: IE Tests OpenVINO Test: plugins and common label Jan 8, 2026
@AyusKumarPathak
Copy link
Author

’ve added comprehensive GPU unit tests covering all reported IEEE-754 edge cases (mixed INF, multiple INF, negative INF, and NaN propagation).
All previously reported failures are now fully covered.
Thanks for the review looking forward to your approval.

@p-durandin
Copy link
Contributor

build_jenkins

@p-durandin p-durandin added this to the 2026.0 milestone Jan 9, 2026
@praasz praasz self-assigned this Jan 16, 2026
AyusKumarPathak and others added 2 commits January 16, 2026 19:47
…max.cpp

Co-authored-by: Pawel Raasz <pawel.raasz@intel.com>
@AyusKumarPathak
Copy link
Author

applied suggestion kindly review

@AyusKumarPathak
Copy link
Author

waiting for pr approval kindly review

Copy link
Contributor

@praasz praasz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, generic test part.
@e-ddykim , @Lyamin-Roman could you review

@praasz praasz modified the milestones: 2026.0, 2026.1 Feb 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

category: GPU OpenVINO GPU plugin category: IE Tests OpenVINO Test: plugins and common ExternalPR External contributor

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants