Code for my thesis, written in Python 3.6 using dynet.
The thesis can be found here.
Use make
to obtain the data and install EVALB:
make data # download ptb and unlabeled data
make evalb # install EVALB
Use make
to train a number of standard models:
make disc # train discriminative rnng
make gen # train generative rnng
make crf # train crf
make fully-unsup-crf # train rnng + crf (vi) fully unsupervised
You can list all the options with:
make list
Alternatively, you can use command line arguments:
python src/main.py train --model-type=disc-rnng --model-path-base=models/disc-rnng
For all available options use:
python src/main.py --help
To set the environment variables used in evaluation of trained models, e.g. CRF_PATH=models/crf_dev=90.01
, use:
source scripts/best-models.sh
Models are saved to folder models
with their name and development scores. We have included our best models by development score as zip. To use them run unzip zipped/file.zip
from the models
directory.
I have relied on some excellent implementations for inspiration and help with my own implementation:
- pytorch-rnng inspired the representation of the RNNG parser class
- minimal-span-parser provided the foundations of the tree classes, the vocabulary class and of the CRF parser
- im2latex inspired the use of a makefile to organize experiments
Make sure to check them out!