Fix nans in gradient from inverse softplus #123
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Describe your changes
Fix a bug where the inverse softplus causes gradients to become nans.
I believe the issue was that even though the
non_linear_part
had clamping inside, and the values were later being discarded if they were above the threshold, the input x values could still cause numerical instabilities before we checked if the input was above the threshold.What I did to fix it was to clamp the input x values. The limits were previously defined as$exp(x\cdot beta)-1 \ge 10^{-6}$ and $x\cdot beta \le threshold$ , so I changed it to the equivalent $log(10^{-6}+1)/beta \le x \le threshold/beta$ .
Issue Link
closes #119
Type of change
Checklist before requesting a review
pull
with--rebase
option if possible).Checklist for reviewers
Each PR comes with its own improvements and flaws. The reviewer should check the following:
Author checklist after completed review
reflecting type of change (add section where missing):
Checklist for assignee