Potential bug in timing embedding #1923

addtt · 2023-01-19T09:44:16Z

Hi,

There might be a small bug here:

tensor2tensor/tensor2tensor/layers/common_attention.py

Lines 445 to 449 in ef1fcce

    
           log_timescale_increment = ( 
        
               math.log(float(max_timescale) / float(min_timescale)) / 
        
               tf.maximum(tf.to_float(num_timescales) - 1, 1)) 
        
           inv_timescales = min_timescale * tf.exp( 
        
               tf.to_float(tf.range(num_timescales)) * -log_timescale_increment)

I think in the last line the exp should be divided by min_timescale rather than multiplied, since it's inverse timescales. Usually min_timescale is 1 so it doesn't matter. But e.g. if you fix max_timescale and change min_timescale, the resulting inverse timescale corresponding to max_timescale changes.

A simpler implementation could be roughly something like this:

inv_timescales = exp(-linspace(log(min_timescale), log(max_timescale), num_timescales))

and from this one you can derive the current implementation, except with division instead of multiplication. It can be even simpler with logspace but tf seems to have this function only as experimental.

Let me know if this makes sense.

Thanks a lot!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential bug in timing embedding #1923

Potential bug in timing embedding #1923

addtt commented Jan 19, 2023

Potential bug in timing embedding #1923

Potential bug in timing embedding #1923

Comments

addtt commented Jan 19, 2023