The dataset is available at: https://github.com/google-research-datasets/sentence-compression. Download and store the *.gz
files in data/
directory.
This project requires python3.6+ and pytorch1.1+. It used the models and embeddings from FLAIR framework:
pip install flair
In order to train a sequence tagging model, the original data need to be align into sequence tagging format. To align the downloaded data:
export PRJ_HOME=<path/to/this/project>
bash $PRJ_HOME/runs/preprocess.sh
Different training configs for each settings are available in runs/
. To start training:
export PRJ_HOME=<path/to/this/project>
bash $PRJ_HOME/runs/train_<config_name>.sh