RedwoodsLpp

Data/Software Companion for Layers of Interpretation: On Grammar and Compositionality

Bender, Emily M., Dan Flickinger, Stephan Oepen, Woodley Packard and Ann Copestake. 2015. Layers of Interpretation: On Grammar and Compositionality. In Proceedings of the 11th International Conference on Computational Semantics (IWCS 2015), London.

Data

The inter-annotator agreement study in Bender et al 2015 draws its data from Antoine de Saint-Exupéry's The Little Prince. The data is in two collections: a 50-item trial set, for annotator training and guidelines refinement, and a 150-item sample set for measuring IAA. For each of these, we provide the individual annotations of each of the three annotators and an adjudicated gold standard:


Trial (50)	Annotator A	Annotator B	Annotator C	Adjudicated
Sample (150)	Annotator A	Annotator B	Annotator C	Adjudicated

In addition, the combined set of 200 adjudicated items are included in the treebanks for the 1214 release of the ERG.

Software

Grammar and Parser The profiles for treebanking were prepared with the English Resource Grammar, in its 1214 release, and the ACE parser, 0.9.19 release.

Treebanking The annotation was done with the Full Forest Treebanker, at tagged version http://sweaglesw.org/svn/treebank/tags/packard-2015/

Inter-Annotator Agreement IAA statistics were computed using the code in iaa.lisp and can be invoked in the LOGON system (see the LogonInstallation page for installation instructions]]) as follows:

  $LOGONROOT/redwoods --binary --erg --run iaa.lisp

Export In addition to the native representations, the analyses in the annotated profiles can be exported in different formats.

Guidelines

We adopted the following annotation guidelines in our IAA study. The versions linked here were refined on the basis of discussion of the 50-item trial set and then used in the annotation of the 150-item sample set. Note that this was done in two passes: One pass annotating and adjudicating the sample and then trial without any bridging rules and then a second pass (sample first, then trial) for the items that were rejected in the initial adjudicated gold standard with the bridging rules turned on. The guidelines for treebanking with bridges were initially developed in the course of this study; the other guidelines predate this study and have been developed in the context of ErgTreebanking more generally.

General heuristics for ERG Treebanking (updated after sample adjudication without bridges)
Notes on rule distinctions (not updated during this study)
Heuristics for treebanking with bridges (updated after sample adjudication with bridges)
Lexical type database (not updated during this study; automatically generated from ERG 1214)
Notes on lexical types (not updated during this study)

Home | Forum | Discussions | Events

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RedwoodsLpp

Data/Software Companion for Layers of Interpretation: On Grammar and Compositionality

Data

Software

Guidelines

Clone this wiki locally