Transformer Variations

"Variations of Transformer" running on Tensor2Tensor Library.

All experiments are run over Tensor2Tensor v1.2.9 and Tensorflow 1.4.0.

Transformer without segmentation
- See TransformerChrawr
- Transformer version of Fully Character-Level Neural Machine Translation without Explicit Segmentation
MOS
- See MixtureOfSoftmaxSymbolModality
- Add a config like: hparams.target_modality="symbol:mos"
- T2T Implementation of Breaking the Softmax Bottleneck
Fast Transformer
- See TransformerFast
- Add Encoder-Decoder attention cache, which is not implemented in T2T yet.
- For my case, it is about 2.5 times faster than T2T base transformer model.
Transformer with Average Attention Network
- See TransformerFastAan
- Add Encoder-Decoder attention cache
- For my case, it is about 2.4 time faster than T2T base transformer model.
- T2T Implementation of Accelerating Neural Transformer via an Average Attention Network

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
data_generators		data_generators
layers		layers
models		models
pictures		pictures
.gitignore		.gitignore
README.md		README.md
Transformer_without_segmentation.md		Transformer_without_segmentation.md
__init__.py		__init__.py