Skip to content

English WikiHow instructional guides semantically annotated with Universal Conceptual Cognitive Annotation (UCCA)

Notifications You must be signed in to change notification settings

nert-nlp/Whow_UCCA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Whow_UCCA

Whow_UCCA is a corpus of English WikiHow instructional guides semantically annotated with Universal Conceptual Cognitive Annotation (UCCA). It is comprised of 11 documents about varying topics, which were previously annotated with an array of linguistic (POS, syntax, discourse structure and more) and document-structure information as part of the GUM project (github).

The UCCA annotations were carried out by the students and instructor of the Advanced Semantic Representation course at Georgetown University in the Fall semester 2018.


Files and directories

  • ucca-guidelines.pdf: The version of the UCCA annotation guidelines used for the compilation of this corpus.
  • unreviewed/xml: Annotated passages before review/adjudication.
  • raw/txt: Passages in raw text format (tokenized).

Documents and passages

The corpus contains 11 documents with token counts ranging from 656 to 1160. For comparability and to facilitate annotation, we split each document into 2-4 passages ranging between 104 and 355 tokens each. At least 2 passages / 607 tokens of each document have been annotated with UCCA by at least one annotator.

Filenames follow the pattern whow_<DOCUMENT>_<PASSAGE>_<XXXX>.xml. So the file whow_ballet_2_orig.xml, for instance, contains the annotation for the 2nd passage of document "whow_ballet".

Multiple annotations per passage

In order to compute inter-annotator agreement (IAA), one randomly selected passage per document has been annotated by two additional annotators. In the file naming schema described above, <XXXX> (one out of {orig, iaa1, iaa2}) indicates whether the annotation of this passage was done by the annotator originally assigned to it (primary annotator), or one of the secondary annotators.


Annotation Web-App

The annotations were carried out through the web-based annotation tool UCCAApp (demo).

UCCA API

The UCCA Python API by Daniel Hershcovich and Amit Beka provides functionality to read, analyze, manipulate, and write the annotations as XML. To get it, you can clone the following github repository:

git clone https://github.com/danielhers/ucca.git

or install the package via pip:

pip install ucca

References

About

English WikiHow instructional guides semantically annotated with Universal Conceptual Cognitive Annotation (UCCA)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published