NaN values in gradients #29

liuem607 · 2021-04-10T19:34:11Z

Hi, in my experiment, I used Moving-MNIST dataset. But here are my problems during training that I couldn't find an answer:

I tried to play with a small network by using only num_latent_scale=1 and num_groups_per_scale=1. Then I realized there were no gradients generated for parameters including prior.ftr0 and an error was given to stop the training.

If I increase num_groups_per_scale from 1 to 2 or more, I still got Nan in some of the gradients in the first iteration, then they went away, but the training continues without errors.

I'm wondering if you could provide some hint or clue to why such behavior happens? Thank you in advance!

arash-vahdat · 2021-04-11T17:01:38Z

Hi, getting no gradient for num_latent_scale=1 and num_groups_per_scale=1 is weird. By no gradients, do you mean that the gradients were zero or None? If they were zero, do you see any changes after some time of training?

Getting NaN in gradient is natural especially at the beginning of the training. We are using mixed precision which means that most operations are cast to FP16. Because of the lower precision, we may get NaN easily and it's autocast and grad_scalar's job to drop these gradients and scale the loss such that we don't get NaN.

You can disable mixed-precision by supplying enabled=False to autocast() at this line:

NVAE/train.py

Line 163 in 38eb997

with autocast():

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NaN values in gradients #29

NaN values in gradients #29

liuem607 commented Apr 10, 2021

arash-vahdat commented Apr 11, 2021 •

edited

Loading

NaN values in gradients #29

NaN values in gradients #29

Comments

liuem607 commented Apr 10, 2021

arash-vahdat commented Apr 11, 2021 • edited Loading

arash-vahdat commented Apr 11, 2021 •

edited

Loading