Fix GPU Softmax NaN propagation for mixed infinite inputs#33498
Fix GPU Softmax NaN propagation for mixed infinite inputs#33498AyusKumarPathak wants to merge 8 commits intoopenvinotoolkit:masterfrom
Conversation
| // Handle IEEE-754 case when max_value is INF | ||
| if (isinf((float)max_value)) { | ||
| for (cls = 0; cls < class_num; ++cls) { | ||
| ACCUMULATOR_TYPE v = data[cls * TMP_CLASS_PITCH]; | ||
| if (v == max_value) | ||
| data[cls * TMP_CLASS_PITCH] = (ACCUMULATOR_TYPE)NAN; | ||
| else | ||
| data[cls * TMP_CLASS_PITCH] = (ACCUMULATOR_TYPE)0.0f; | ||
| } | ||
|
|
||
| // Write results and exit | ||
| for (cls = 0; cls < class_num; ++cls) { | ||
| #if INPUT0_SIMPLE == 1 | ||
| const uint output_idx = out_depth_offset + cls*OUTPUT_CLASS_PITCH; | ||
| #else | ||
| #if INPUT0_DIMS == 5 | ||
| const uint output_idx = OUTPUT_GET_INDEX(b + *b_offset, f + *f_offset, z + *z_offset, y + *y_offset, x + *x_offset); | ||
| #else | ||
| const uint output_idx = OUTPUT_GET_INDEX(b + *b_offset, f + *f_offset, y + *y_offset, x + *x_offset); | ||
| #endif | ||
| #endif | ||
| output[output_idx] = data[cls * TMP_CLASS_PITCH]; | ||
| } | ||
| return; | ||
| } |
There was a problem hiding this comment.
I think we need to apply fused ops for the issued case too.
Please check my suggestion below.
| // Handle IEEE-754 case when max_value is INF | |
| if (isinf((float)max_value)) { | |
| for (cls = 0; cls < class_num; ++cls) { | |
| ACCUMULATOR_TYPE v = data[cls * TMP_CLASS_PITCH]; | |
| if (v == max_value) | |
| data[cls * TMP_CLASS_PITCH] = (ACCUMULATOR_TYPE)NAN; | |
| else | |
| data[cls * TMP_CLASS_PITCH] = (ACCUMULATOR_TYPE)0.0f; | |
| } | |
| // Write results and exit | |
| for (cls = 0; cls < class_num; ++cls) { | |
| #if INPUT0_SIMPLE == 1 | |
| const uint output_idx = out_depth_offset + cls*OUTPUT_CLASS_PITCH; | |
| #else | |
| #if INPUT0_DIMS == 5 | |
| const uint output_idx = OUTPUT_GET_INDEX(b + *b_offset, f + *f_offset, z + *z_offset, y + *y_offset, x + *x_offset); | |
| #else | |
| const uint output_idx = OUTPUT_GET_INDEX(b + *b_offset, f + *f_offset, y + *y_offset, x + *x_offset); | |
| #endif | |
| #endif | |
| output[output_idx] = data[cls * TMP_CLASS_PITCH]; | |
| } | |
| return; | |
| } | |
| for (cls = 0; cls < class_num; ++cls) { | |
| // Handle IEEE-754 case when max_value is INF | |
| if (isinf((float)max_value)) { | |
| if (data[cls*TMP_CLASS_PITCH] == max_value) | |
| data[cls*TMP_CLASS_PITCH] = TO_ACCUMULATOR_TYPE(NAN); | |
| else | |
| data[cls*TMP_CLASS_PITCH] = TO_ACCUMULATOR_TYPE(0.0f); | |
| } else { | |
| ACCUMULATOR_TYPE t = native_exp(data[cls*TMP_CLASS_PITCH] - max_value); | |
| denominator += t; | |
| data[cls*TMP_CLASS_PITCH] = t; | |
| } | |
| } | |
| .... | |
| for (cls = 0; cls < class_num; ++cls) { | |
| ACCUMULATOR_TYPE res = data[cls*TMP_CLASS_PITCH]; | |
| if (!isinf((float)max_value)) { | |
| res = res / denominator; | |
| } |
|
Thanks for the suggestion but I’ve already reworked with the INF handling so it’s processed inside the main computation loops. This ensures fused ops and activation are now applied consistently for both INF and non-INF paths. I also verified the original issue cases and the mixed-INF edge cases against the CPU implementation. |
e-ddykim
left a comment
There was a problem hiding this comment.
Additionally, could you please add unit tests for the issue case?
|
’ve added comprehensive GPU unit tests covering all reported IEEE-754 edge cases (mixed INF, multiple INF, negative INF, and NaN propagation). |
|
build_jenkins |
src/tests/functional/plugin/shared/include/single_op_tests/softmax.hpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_gpu/tests/functional/single_layer_tests/softmax.cpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_gpu/tests/functional/single_layer_tests/softmax.cpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_gpu/tests/functional/single_layer_tests/softmax.cpp
Outdated
Show resolved
Hide resolved
src/plugins/intel_gpu/tests/functional/single_layer_tests/softmax.cpp
Outdated
Show resolved
Hide resolved
…max.cpp Co-authored-by: Pawel Raasz <pawel.raasz@intel.com>
|
applied suggestion kindly review |
|
waiting for pr approval kindly review |
praasz
left a comment
There was a problem hiding this comment.
Ok, generic test part.
@e-ddykim , @Lyamin-Roman could you review
🐞 Bug Fix: Correct NaN Propagation in GPU Softmax with Mixed Infinite Inputs
This PR fixes a numerical stability bug in the GPU implementation of the Softmax operation where inputs containing a mix of
infand finite values produce incorrect results.🔬 Problem Summary
For an input such as:
[inf, 1.0, 2.0]
The mathematically correct Softmax (IEEE-754 compliant) behavior is:
However, the current GPU kernel produced:
[nan, nan, nan]
while the CPU implementation already behaves correctly.
The root cause is that when
max_value == inf, the kernel evaluates:exp(inf - inf) → NaN
which contaminates the denominator and propagates NaNs to all output positions.
🛠️ Solution
The GPU Softmax kernel now explicitly detects the
max_value == infcase and applies IEEE-754 semantics:max_value→NaN0.0This avoids unstable exponential evaluation and ensures GPU output exactly matches CPU behavior.
The fix is minimal, local to the kernel, and does not affect performance for normal inputs.
📈 Impact
[inf, 1, 2][nan, 0, 0][nan, nan, nan][nan, 0, 0][inf, -inf, 1][nan, 0, 0][nan, nan, nan][nan, 0, 0][-inf, 1, 2][0, .2689, .7311][0, .2689, .7311]🧩 Files Modified
src/plugins/intel_gpu/src/kernel_selector/cl_kernels/softmax_gpu_ref.cl
🧪 Testing
Verified against the reproducer from issue #33456.
GPU output now matches CPU output for all reported edge cases.
🔗 Related Issue
Fixes #33456