ACL 2018 MSR Workshop submission

Please use the following citation:

@InProceedings{W18-3602,
    author    = 	"Puzikov, Yevgeniy and Gurevych, Iryna",
    title     = 	"BinLin: A Simple Method of Dependency Tree Linearization",
    booktitle = 	"Proceedings of the First Workshop on Multilingual Surface Realisation",
    year      = 	"2018",
    publisher = 	"Association for Computational Linguistics",
    pages     = 	"13--28",
    location  = 	"Melbourne, Australia",
    url       = 	"http://aclweb.org/anthology/W18-3602"
}

Abstract:
Surface Realization Shared Task 2018 is a workshop on generating sentences from lemmatized sets of dependency triples. This paper describes the results of our participation in the challenge. We develop a data-driven pipeline system which first orders the lemmas and then conjugates the words to finish the surface realization process. Our contribution is a novel sequential method of ordering lemmas, which, despite its simplicity, achieves promising results. We demonstrate the effectiveness of the proposed approach, describe its limitations and outline ways to improve it.

Contact person: Yevgeniy Puzikov, puzikov@ukp.informatik.tu-darmstadt.de

https://www.ukp.tu-darmstadt.de/

https://www.tu-darmstadt.de/

Don't hesitate to send us an e-mail or report an issue, if something is broken (and it shouldn't be) or if you have further questions.

This repository contains experimental software and is published for the sole purpose of giving additional background details on the respective publication.

Background info

Official website: http://taln.upf.edu/pages/msr2018-ws/
Track: Shallow
Informal task description: given a lemmatized dependency tree, generate a sentence from it.
Evaluation protocol:
- automatic metrics (BLEU, NIST, CIDEr, normalized edit distance)
- human evaluation by preference judgments

Project structure

The repository has the following structure:

run_experiment.py: main script to run
sample configuration files to use with the script above
- settings for the syntactic ordering component: en_syn-config.yaml (SynMLP)
- settings for the morphological inflection generation component:
  - en_morph-mlp-config.yaml (MorphMLP)
  - en_morph-rnn-soft-config.yaml (MorphRNNSoft)
  - en_morph-rnn-hard-config.yaml (MorphRNNHard)
components/: NN components and utility functions
baselines/: scripts to run baseline models

Requirements

64-bit Linux versions
Python 3 and dependencies:
- PyTorch v0.3.1
- Progressbar2 v3.18.1
- Matplotlib v2.2.2
- NLTK v3.3

Installation

The code was developed and tested using an Anaconda environment. Install Anaconda on your machine, create an environment (e.g., 'py3.6') and install Python3 dependencies:

$ conda install -c anaconda -n py3.6 numpy pyyaml mkl mkl-include setuptools cmake cffi typing
$ conda install -c anaconda -n py3.6 nccl pytorch cudnn cudatoolkit
$ conda install -c anaconda -n py3.6 progressbar2 nltk

Experiments

Preparation

The repository contains four template configuration files (*.yaml) for training neural models and using them later for prediction.

Before running anything:

Revise the configuration files -- set the paths and parameter values!
Navigate to ./components/data/morph_align/ and run:
```
$ make all
```

This will compile the source code for the Chinese Restaurant Process string pair aligner (reused from here).

Training NN models

Run the following command:

$ python run_experiment.py -m train -c some_config.yaml

After the experiment, a folder will be created under the directory specified by the experiments_dir field of my_config.yaml file. This folder should contain the following files:
- experiment log (train.log)
- best model weights (weights.epochXX_*, where XX stands for epoch number and * shows the approximate performance of the model)
- development set predictions for each training epoch (predictions.epochX)
- serialized vocabulary used to map inputs to numerical IDs
- a csv file with scores and train/dev losses for each epoch (scores.csv)
- configuration dictionary in json format (config.json)
  - pdf files with learning curves

Using trained models for prediction

Stage-wise prediction is done using the following command:
```
$ python run_experiment.py -m predict -c some_config.yaml
```
Do not forget to specify the model path in the model_fn field of the config file. The predictions done by the loaded model will be stored in /path/to/model_fn.dev.STAGE.predictions. Here STAGE can be either morph or syn, depending on the value of the field stage in the config file.
Full pipeline prediction is done using the following command:
```
$ python run_experiment.py -m pipeline -c syn_config.yaml morph_config.yaml -o output_file
```
The predictions done by the pipeline model will be stored as: - /path/to/output_file.dev.final.txt (for the dev data specified in the configuration file) - /path/to/output_file.test.final.txt (for the test data specified in the configuration file)

Running the baselines:

We implemented three baselines:

morph_lemma: LEMMA baseline (morphological inflection generation component)
morph_major: MAJOR baseline (morphological inflection generation component)
syn_random: RAND baseline (syntactic ordering component)

To run each baseline, the following steps should be performed:

Make a folder to store the development set files (e.g., /path/to/dev_refs)
Make a folder to store baseline predictions (e.g., /path/to/dev_hyp)

To make predictions using one of the baselines, run the following command:

$ python BASELINE.py /path/to/dev_refs /path/to/dev_hyp

Here, BASELINE.py stands for one of the following Python scripts:

./baselines/morph_lemma.py
./baselines/morph_major.py
./baselines/syn_random.py

Evaluation

The official evaluation scripts can be found on the workshop webpage. Shortcut instructions:

Put all references into A/ folder, all predictions into B/ folder.
Make sure the filenames in both A/ and B/ are the same.
Run the following command:

$ python eval_Py3.py /path/to/A /path/to/B

Note: make sure your NLTK package is up-to-date!

If it is not, NIST scores will be wrong.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
baselines		baselines
components		components
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
NOTICE.txt		NOTICE.txt
README.md		README.md
__init__.py		__init__.py
en_morph-mlp-config.yaml		en_morph-mlp-config.yaml
en_morph-rnn-hard-config.yaml		en_morph-rnn-hard-config.yaml
en_morph-rnn-soft-config.yaml		en_morph-rnn-soft-config.yaml
en_syn-config.yaml		en_syn-config.yaml
run_experiment.py		run_experiment.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ACL 2018 MSR Workshop submission

Background info

Project structure

Requirements

Installation

Experiments

Preparation

Training NN models

Using trained models for prediction

Running the baselines:

Evaluation

About

Releases

Packages

Languages

License

UKPLab/acl2018-msr-workshop-binlin

Folders and files

Latest commit

History

Repository files navigation

ACL 2018 MSR Workshop submission

Background info

Project structure

Requirements

Installation

Experiments

Preparation

Training NN models

Using trained models for prediction

Running the baselines:

Evaluation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages