You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jul 7, 2023. It is now read-only.
I think in the last line the exp should be divided by min_timescale rather than multiplied, since it's inverse timescales. Usually min_timescale is 1 so it doesn't matter. But e.g. if you fix max_timescale and change min_timescale, the resulting inverse timescale corresponding to max_timescale changes.
A simpler implementation could be roughly something like this:
and from this one you can derive the current implementation, except with division instead of multiplication. It can be even simpler with logspace but tf seems to have this function only as experimental.
Let me know if this makes sense.
Thanks a lot!
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Hi,
There might be a small bug here:
tensor2tensor/tensor2tensor/layers/common_attention.py
Lines 445 to 449 in ef1fcce
I think in the last line the
exp
should be divided bymin_timescale
rather than multiplied, since it's inverse timescales. Usuallymin_timescale
is 1 so it doesn't matter. But e.g. if you fixmax_timescale
and changemin_timescale
, the resulting inverse timescale corresponding tomax_timescale
changes.A simpler implementation could be roughly something like this:
and from this one you can derive the current implementation, except with division instead of multiplication. It can be even simpler with logspace but tf seems to have this function only as experimental.
Let me know if this makes sense.
Thanks a lot!
The text was updated successfully, but these errors were encountered: