How to train a transformer model with multi-source encoders ? #822

penny9287 · 2019-06-19T05:03:00Z

I wonder how to modify the configuration file to train a multi-source based transformer model with different attention types.

jindrahelcl · 2019-06-19T21:54:38Z

Hi, currently, the Transformer decoder only supports the multi-head scaled dot-product attention from the "Attention is All You Need" paper. If you provide multiple encoders, you can choose which attention combination strategy you want to use, one of serial, parallel, hierarchical, and flat.

wyjllm · 2019-06-20T03:32:18Z

I wonder how to specify the combination strategy for multiple encoders in the configuration file, have any examples?

jindrahelcl · 2019-06-20T22:40:05Z

just specify the attention_combination_strategy parameter in the transformer decoder configuration. It can be one of serial, parallel, hierarchical, and flat.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to train a transformer model with multi-source encoders ? #822

How to train a transformer model with multi-source encoders ? #822

penny9287 commented Jun 19, 2019

jindrahelcl commented Jun 19, 2019

wyjllm commented Jun 20, 2019

jindrahelcl commented Jun 20, 2019

How to train a transformer model with multi-source encoders ? #822

How to train a transformer model with multi-source encoders ? #822

Comments

penny9287 commented Jun 19, 2019

jindrahelcl commented Jun 19, 2019

wyjllm commented Jun 20, 2019

jindrahelcl commented Jun 20, 2019