"Variations of Transformer" running on Tensor2Tensor Library.
-
Transformer without segmentation
- See TransformerChrawr
- Transformer version of Fully Character-Level Neural Machine Translation without Explicit Segmentation
-
MOS
- See MixtureOfSoftmaxSymbolModality
- Add a config like: hparams.target_modality="symbol:mos"
- T2T Implementation of Breaking the Softmax Bottleneck
-
Fast Transformer
- See TransformerFast
- Add Encoder-Decoder attention cache, which is not implemented in T2T yet.
- For my case, it is about 2.5 times faster than T2T base transformer model.
-
Transformer with Average Attention Network
- See TransformerFastAan
- Add Encoder-Decoder attention cache
- For my case, it is about 2.4 time faster than T2T base transformer model.
- T2T Implementation of Accelerating Neural Transformer via an Average Attention Network