You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, currently, the Transformer decoder only supports the multi-head scaled dot-product attention from the "Attention is All You Need" paper. If you provide multiple encoders, you can choose which attention combination strategy you want to use, one of serial, parallel, hierarchical, and flat.
just specify the attention_combination_strategy parameter in the transformer decoder configuration. It can be one of serial, parallel, hierarchical, and flat.
I wonder how to modify the configuration file to train a multi-source based transformer model with different attention types.
The text was updated successfully, but these errors were encountered: