You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Additionally, the logits are the codebook distances (dist in the first snippet above). It's an always positive variable, which means that it's going to be biased because it's bounded at zero. There are no gradients flowing from the sampling operation backwards (because it is not a Gumbel softmax, but a Gumbel max) hence the logits magnitude never gets altered to improve the sampling.
It seems to me that this is just takes a hidden variable (the distance matrix) normalizes it given an arbitrary temperature parameter and samples from it, adding biased noise to the straight-through relaxation... What am I missing?
The text was updated successfully, but these errors were encountered:
Hi all,
I want to ask a question regarding some concerns I got looking at the usage of the gumbel_sample method when
reinmax=False
.vector-quantize-pytorch/vector_quantize_pytorch/vector_quantize_pytorch.py
Line 472 in 6102e37
First, this sampling technique is mathematically equivalent to sample from the categorical distribution, Gumbel is doing nothing here (just sampling), and the argmax makes the operation non differentiable (I know we apply STE later).
vector-quantize-pytorch/vector_quantize_pytorch/vector_quantize_pytorch.py
Lines 72 to 77 in 6102e37
Additionally, the
logits
are the codebook distances (dist
in the first snippet above). It's an always positive variable, which means that it's going to be biased because it's bounded at zero. There are no gradients flowing from the sampling operation backwards (because it is not a Gumbel softmax, but a Gumbel max) hence the logits magnitude never gets altered to improve the sampling.It seems to me that this is just takes a hidden variable (the distance matrix) normalizes it given an arbitrary temperature parameter and samples from it, adding biased noise to the straight-through relaxation... What am I missing?
The text was updated successfully, but these errors were encountered: