Fix ZeroDivisionError and NaN propagation in reward model metrics#3779
Open
Mr-Neutr0n wants to merge 1 commit intoLAION-AI:mainfrom
Open
Fix ZeroDivisionError and NaN propagation in reward model metrics#3779Mr-Neutr0n wants to merge 1 commit intoLAION-AI:mainfrom
Mr-Neutr0n wants to merge 1 commit intoLAION-AI:mainfrom
Conversation
The kendall_tau() and spearmanr() functions divide by bsize without checking if it is zero. This causes a ZeroDivisionError when all labels are padding (-100) or when the input batch has no valid label groups. Additionally, when a label group has fewer than 2 ranked items, scipy's kendalltau/spearmanr return NaN, which silently propagates through the accumulated score and corrupts the final metric value. Changes: - Skip label groups with fewer than 2 items (correlation is undefined for single-element arrays) - Only increment bsize for groups that produce a valid (non-NaN) correlation result - Return 0.0 instead of dividing by zero when bsize is 0 - Guard reward_accuracy() against empty score arrays, which would cause np.mean to return NaN on an empty array
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The reward model metric functions in
model/model_training/metrics.pyhave two related bugs that can cause training to crash or silently produce corrupt evaluation results:ZeroDivisionError in
kendall_tau()andspearmanr(): Both functions divide bybsizeat the end without checking whether it is zero. When all labels in a batch are padding (-100), no label groups are found,bsizestays at 0, and the division raisesZeroDivisionError.NaN propagation: When a label group contains fewer than 2 ranked items,
scipy.stats.kendalltauandscipy.stats.spearmanrreturnNaN(correlation is undefined for single-element arrays). ThisNaNgets added to the running sum and silently corrupts the final metric value, making itNaNfor the entire evaluation step.Empty array in
reward_accuracy(): If no valid label groups are found,pos_scoresandneg_scoresremain empty lists. Callingnp.meanon an empty array produces aRuntimeWarningand returnsNaN.Fix
kendall_tau()andspearmanr().NaN, and only incrementbsizefor valid results.0.0instead of dividing by zero whenbsizeis 0.reward_accuracy()when no scores were collected.Testing
These edge cases arise during reward model evaluation when:
-100)The fix ensures that metric computation completes without errors and returns deterministic fallback values (
0.0) rather than crashing or returningNaN.