Skip to content

cyk1337/Highway-Transformer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Highway Transformer: Self-Gating Enhanced Self-Attentive Networks

ACL2020 Highway Transformer GitHub

This repo is the demo code of Transformer-XL using Self-Dependency Unit. This work is closedly related to Gating-enhanced Transformer variants, such as Google's Switch Transformers.

Yekun Chai et. al., Highway Transformer: Self-Gating Enhanced Self-Attentive Networks (ACL 2020)

Requirements

  • PyTorch >= 1.1.0
  • TensorboardX >= 1.8
  • Tensorboard >= 1.14
  • 4 GPUs of each 8GB memory for running 12 layer Transformer-XL

Data download

bash getdata.sh

Run 6-layer Transformer-XL

cd pytorch/xl_L6_scripts && bash <script-name>.sh train --work_dir "PATH_TO_WORK_DIR"

Visualizing Your Result

cd XL-L6-results && tensorboard --logdir=.

Results

  • Line plots of different model settings, where the topmost line (in red) is the baseline model (i.e., original Transformer-XL).
  • After adding Self-Dependency Unit (see bottom two curves), it is clear that Highway Transformer speeds up the convergence process during training and evaluation.
training bpc                                             training loss                                           
alt-Training-1 Training loss
eval bpc                                                    eval loss                                                  
eval BPC eval BPC

Citation

For attribution in academic contexts, please cite this work as:

@inproceedings{chai-etal-2020-highway,
    title = "Highway Transformer: Self-Gating Enhanced Self-Attentive Networks",
    author = "Chai, Yekun  and
      Jin, Shuo  and
      Hou, Xinwen",
    booktitle = "Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics",
    month = jul,
    year = "2020",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.acl-main.616",
    pages = "6887--6900"
}