Skip to content

Nan loss #1068

Closed Answered by BenjaminBossan
AhmedThahir asked this question in Q&A
Oct 18, 2024 · 1 comments · 4 replies
Discussion options

You must be logged in to vote

The reason why you're observing this is floating point arithmetic. Even though mathematically, the net's output should exactly correspond to y, in practice there are small differences. E.g. if the target is 127., the prediction is 126.9999. Given these small differences, the loss is non-zero, thus the parameters are changed a little bit. Normally, this should reduce the error but as the learning rate is too big, it actually increases, making this difference bigger and bigger after each update. If you change the learning rate to something smaller, like 1e-6 or 1e-7, you won't see this diverging behavior.

Note that regression uses MSE loss by default, which can be very sensitive to outliers…

Replies: 1 comment 4 replies

Comment options

You must be logged in to vote
4 replies
@BenjaminBossan
Comment options

Answer selected by AhmedThahir
@AhmedThahir
Comment options

@BenjaminBossan
Comment options

@AhmedThahir
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants