Whow_UCCA

Whow_UCCA is a corpus of English WikiHow instructional guides semantically annotated with Universal Conceptual Cognitive Annotation (UCCA). It is comprised of 11 documents about varying topics, which were previously annotated with an array of linguistic (POS, syntax, discourse structure and more) and document-structure information as part of the GUM project (github).

The UCCA annotations were carried out by the students and instructor of the Advanced Semantic Representation course at Georgetown University in the Fall semester 2018.

Files and directories

ucca-guidelines.pdf: The version of the UCCA annotation guidelines used for the compilation of this corpus.
unreviewed/xml: Annotated passages before review/adjudication.
raw/txt: Passages in raw text format (tokenized).

Documents and passages

The corpus contains 11 documents with token counts ranging from 656 to 1160. For comparability and to facilitate annotation, we split each document into 2-4 passages ranging between 104 and 355 tokens each. At least 2 passages / 607 tokens of each document have been annotated with UCCA by at least one annotator.

Filenames follow the pattern whow_<DOCUMENT>_<PASSAGE>_<XXXX>.xml. So the file whow_ballet_2_orig.xml, for instance, contains the annotation for the 2nd passage of document "whow_ballet".

Multiple annotations per passage

In order to compute inter-annotator agreement (IAA), one randomly selected passage per document has been annotated by two additional annotators. In the file naming schema described above, <XXXX> (one out of {orig, iaa1, iaa2}) indicates whether the annotation of this passage was done by the annotator originally assigned to it (primary annotator), or one of the secondary annotators.

Annotation Web-App

The annotations were carried out through the web-based annotation tool UCCAApp (demo).

UCCA API

The UCCA Python API by Daniel Hershcovich and Amit Beka provides functionality to read, analyze, manipulate, and write the annotations as XML. To get it, you can clone the following github repository:

git clone https://github.com/danielhers/ucca.git

or install the package via pip:

pip install ucca

References

Universal Conceptual Cognitive Annotation (UCCA)
Omri Abend and Ari Rappoport (2013). ACL 2013.
UCCAApp: Web-application for Syntactic and Semantic Phrase-based Annotation
Omri Abend, Shai Yerushalmi and Ari Rappoport (2017). ACL 2017.
The GUM Corpus: Creating Multilayer Resources in the Classroom
Amir Zeldes (2017). Language Resources and Evaluation 51(3), 581–612.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
raw/txt		raw/txt
unreviewed/xml		unreviewed/xml
README.md		README.md
ucca-guidelines.pdf		ucca-guidelines.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Whow_UCCA

Files and directories

Documents and passages

Multiple annotations per passage

Annotation Web-App

UCCA API

References

About

Releases

Packages

Contributors 2

nert-nlp/Whow_UCCA

Folders and files

Latest commit

History

Repository files navigation

Whow_UCCA

Files and directories

Documents and passages

Multiple annotations per passage

Annotation Web-App

UCCA API

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Packages