This repo contains codes of our paper
Better Language Model with Hypernym Class Prediction
He Bai, Tong Wang, Alessandro Sordoni, Peng Shi
ACL 2022
If you are not using Docker, please make sure your pytorch and cuda version is as same as our Dockerfile
and also install the python packages listed in Dockerfile
Run .get_data.sh
to download and unzip wikitext-103 dataset automatically.
Download Arixv dataset manually following this link.
arxiv_data.py
is a script for data spliting.
We list all training commands in train.sh
file.
For large model, 8*32GB GPU memorys are required or using 4*40GB with accumulation steps =2
.
For base model and small model, 4*32GB gpus is enough.
The training command above could test the best valid model after training automatically. If you would test by yourself, comment the argument "--do_train" can skip training stage and do evaluation and test directly.