Compare two CoNLL-X files or directories, to obtain the tokenization F-score and POS tag accuracy, as well as the LAS, UAS, and label scores.
Since comparison usually occurs between gold and parsed files, the two files/directories will be differentiated using gold
and parsed
keywords. In other words, you do not need to have gold and parsed files to compare; any two will do.
The tree alignment part of the code uses ced_word_alignment.
Note: the evaluator is also CoNLL-U compatible.
- Two files or directories are passed to the evaluator. If two directories are passed, the directories must have matching file names.
- The files are read, and the trees every two files are compared.
- Align trees using ced_word_alignment
- involves inserting null alignment tokens
- The evaluation scores are then calulated
- tokenization f-score is calculated on all aligned tokens, while the remaining metrics are calulated after removing insertions (null alignment tokens added to the gold tree)
Since ced_word_alignment is used, the second and third assumptions are the same.
- No words are added to either the parsed or gold files.
- No changes to the word order.
- Text is in the same script and encoding.
align_trees.py
aligns trees using the ced_word_alignment algorithmclass_conllx
used to read CoNLL-X filesclasses
dataclasses used throughout the codeconllx_counts
gets different statistics after comparing 2 CoNLL-X filesconllx_scores
calculates scores given countsevaluate_conllx_driver
main scripthandle_args
simplifies use of the argparse libraryrequirements.txt
necessary dependencies needed to run the scripts.- ced_word_alignment/ the ced alignment library
README.md
this document.
- Python 3.8 and above.
To use, you need to first install the necessary dependencies by running the following command:
pip install -r requirements.txt
usage: evaluate_conllx_driver.py [-h] [-g] [-p] [-gd] [-pd]
This script takes 2 CoNLL-X files or 2 directories of CoNLL-X files and evaluates the scores.
required arguments:
-g , --gold the gold CoNLL-X file
-p , --parsed the parsed CoNLL-X file
or:
-gd , --gold_dir the gold directory containing CoNLL-X files
-pd , --parsed_dir the parsed directory containing CoNLL-X files
The sentences used are taken from CamelTB_1001_introduction_1.conllx and CamelTB_1001_night_1_1.conllx (data can be obtained from The Camel Treebank.
The toknization is the same, and so the F_score is 100%, and the insertion/deletion counts are both 0.
python evaluate_conllx_driver.py -g samples/sample_1_gold.conllx -p samples/sample_1_parsed.conllx
tokenization_f_score | 100.0 |
tokenization_precision | 100.0 |
tokenization_recall | 100.0 |
pos | 81.6 |
uas | 55.3 |
label | 65.8 |
las | 44.7 |
insertion_count | 0 |
deletion_count | 0 |
python evaluate_conllx_driver.py -g samples/sample_2_gold.conllx -p samples/sample_2_parsed.conllx
tokenization_f_score | 90. |
tokenization_precision | 90. |
tokenization_recall | 90. |
pos | 86. |
uas | 65. |
label | 75. |
las | 57. |
insertion_count | 2 |
deletion_count | 2 |
conllx_evaluator is available under the MIT license. See the LICENSE file for more info.