You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am also slightly confused regarding this implementation, In the bayes by backprop paper by blundell, the NLL loss is an expectation over the variational posterior weights. I am not sure why there is a multiplication by self.train_size
Nll_loss here is used to calculate the negative log likelihood - logP(D|wi) in the paper of blundell, In my opinion, here should not apply log_softmax ,calculate mean probs and then use mean probs and targets to get cross entropy(softmax+log+nll_loss).
Instead, for each logits obtained from the sampling model, the cross-entropy should be directly calculated by logits and targets to obtain multiple negative log likelihood, and sum these losses according to the number of samples. This is the third term of the formula F(D,θ) in this paper.
In addition, the name of the class should not be ELBO, beta*kl here corresponds to the -elbo loss in paper, but the kl here is actually calculate kl divergence between q and the prior, and the paper is to minimize -elbo, namely -elbo = logq (wi|θ) - logP (wi)
I see that you have multiple nllLoss with train_size.
This makes the value of the loss is very large.
I don't understand why do you do that?
You can explain for me?.
The text was updated successfully, but these errors were encountered: