Highway Transformer: Self-Gating Enhanced Self-Attentive Networks

This repo is the demo code of Transformer-XL using Self-Dependency Unit. This work is closedly related to Gating-enhanced Transformer variants, such as Google's Switch Transformers.

Yekun Chai et. al., Highway Transformer: Self-Gating Enhanced Self-Attentive Networks (ACL 2020)

Requirements

PyTorch >= 1.1.0
TensorboardX >= 1.8
Tensorboard >= 1.14
4 GPUs of each 8GB memory for running 12 layer Transformer-XL

Data download

bash getdata.sh

Run 6-layer Transformer-XL

cd pytorch/xl_L6_scripts && bash <script-name>.sh train --work_dir "PATH_TO_WORK_DIR"

Visualizing Your Result

cd XL-L6-results && tensorboard --logdir=.

Results

Line plots of different model settings, where the topmost line (in red) is the baseline model (i.e., original Transformer-XL).
After adding Self-Dependency Unit (see bottom two curves), it is clear that Highway Transformer speeds up the convergence process during training and evaluation.

training bpc	training loss

eval bpc	eval loss

Citation

For attribution in academic contexts, please cite this work as:

@inproceedings{chai-etal-2020-highway,
    title = "Highway Transformer: Self-Gating Enhanced Self-Attentive Networks",
    author = "Chai, Yekun  and
      Jin, Shuo  and
      Hou, Xinwen",
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2020",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.acl-main.616",
    pages = "6887--6900"
}

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
XL-L6-results		XL-L6-results
fig		fig
pytorch		pytorch
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
getdata.sh		getdata.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Highway Transformer: Self-Gating Enhanced Self-Attentive Networks

Requirements

Data download

Run 6-layer Transformer-XL

Visualizing Your Result

Results

Citation

About

Releases

Packages

Languages

License

cyk1337/Highway-Transformer

Folders and files

Latest commit

History

Repository files navigation

Highway Transformer: Self-Gating Enhanced Self-Attentive Networks

Requirements

Data download

Run 6-layer Transformer-XL

Visualizing Your Result

Results

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages