Skip to content

Latest commit

 

History

History
297 lines (218 loc) · 20.9 KB

File metadata and controls

297 lines (218 loc) · 20.9 KB

Universal Conceptual Cognitive Annotation (UCCA) is a novel semantic approach to grammatical representation. It was developed in the Computational Linguistics Lab of the Hebrew University by Omri Abend and Ari Rappoport.

The central idea of the project is to analyze and annotate natural languages using purely semantic categories and structure (a graph). Syntactic categories and structure are not part of the manual annotation, and are ideally learned implicitly by the parsers. The basic set of semantic categories (the foundational layer) is inspired by work in linguistic typology, cognitive grammar, and neuroscience. The development of additional layers, such as semantic roles and super-senses (adapted from the CARMLS project) is underway.

The annotation so far focused on argument-structure and linkage phenomena. We build primarily on Basic Linguistic Theory (R.M.W. Dixon, 2010a; 2010b; 2012), a widely used approach for language description. We acknowledge that there many applicable analyses for a given sentence, but select, for practical reasons, a small set of highly useful distinctions, and apply them to provide one plausible annotation.

We have annotated 160K tokens from English Wikipedia with the UCCA scheme, as well as a 30K English-French parallel corpus based on Jules Verne's "20K Leagues Under The Sea", and a 120K tokens corpus of the entire book in German. Pilot studies were conducted on several other languages as well.

This page contains links to all of UCCA's resources: corpora, annotation guidelines, parser and code. If you use these resources in your research, please cite the following or other relevant publications:

Universal Conceptual Cognitive Annotation (UCCA).
Omri Abend and Ari Rappoport, ACL 2013.
[Paper: pdf]

Tutorial

A tutorial on Cross-lingual Semantic Representation for NLP with UCCA was presented at COLING 2020. All presentations are available on GitHub.

Annotation Web-App

UCCAApp is a web application for phrase-based annotation in general, and UCCA parsing in particular. Formally, it supports DAG structures, discontiguous units and multiple categories.

The app supports configurable multi-layer annotation and task management, and is written in Django and AngularJS.

UCCAApp: Web-application for Syntactic and Semantic Phrase-based Annotation.
Omri Abend, Shai Yerushalmi and Ari Rappoport. ACL 2017.
[Paper: pdf] [Code: github] [Demo]

Guidelines

UCCA-annotated corpora include the guidelines version they were compiled with in their repository. The most up to date guidelines are available on github (the most recent one is generally in draft mode, but see releases).

[v2 guidelines: pdf] [latest guidelines: pdf]

UCCA-Annotated Corpora

All publicly available with a Creative Commons Attribution-ShareAlike 3.0 Unported license. The guidelines with which each of them was annotated can be found in the repository.

Corpus Link
English Wikipedia [github]
English Web Treebank [github]
English 20K Leagues Under The Sea [github]
Excerpt of the PTB WSJ [github]
German 20K Leagues Under The Sea [github]
French 20K Leagues Under The Sea [github]
German The Little Prince [github]
Hebrew The Little Prince [github]
Russian The Little Prince [github]
English The Little Prince [github]

Datasets produced by other labs:

Corpus Link Paper
Turkish 50 sentences from the METU-Sabanci Turkish Treebank [github] [paper]

UCCA Parser

TUPA is a transition-based parser for Universal Conceptual Cognitive Annotation (UCCA), developed by Daniel Hershcovich, Omri Abend and Ari Rappoport.

A Transition-Based Directed Acyclic Graph Parser for UCCA.
Daniel Hershcovich, Omri Abend and Ari Rappoport. ACL 2017.
[Paper: pdf] [Supp. Material: pdf] [Code: github] [Demo]

It can be installed by: PyPI version

pip install tupa

Source Code

Python toolkit for reading and manipulating UCCA structures. The code was written by Amit Beka and Daniel Hershcovich.

It can be installed by: PyPI version

pip install ucca

[Code: github]

Shared Tasks

UCCA was targeted in the following public parsing competitions, which accompanied top-tier NLP conferences:

SemEval 2019 Task 1

The task included open and closed tracks on English, French and German UCCA corpora from Wikipedia and Twenty Thousand Leagues Under the Sea.

Evaluation is done by labeled F1 on the graph edges, matched by child terminal yield.

SemEval-2019 Task 1: Cross-lingual Semantic Parsing with UCCA.
Daniel Hershcovich, Zohar Aizenbud, Leshem Choshen, Elior Sulem, Ari Rappoport and Omri Abend, SemEval 2019 (shared task).
[Paper: pdf] [Website: link] [Training and development data: link] [Test data: link] [Code: github]

CoNLL 2019 MRP Shared Task

The task included parsing to AMR, UCCA, DM, PSD, and EDS. The UCCA training data is freely available.

UCCA evaluation is done both by UCCA F1 (as in SemEval 2019) and by the MRP metric, which is similar to smatch. The training data contains 6,572 sentences from web reviews and Wikipedia. There are two evaluation sets: one with 1,131, from the same domains (Full), and one with 87 sentences, from The Little Prince (LPP). Note that due to an error, 535 of the 1,131 Full Evaluation sentences were included in the training data, and therefore the full evaluation scores are an overestimate. The LPP scores are unaffected by this.

MRP 2019: Cross-Framework Meaning Representation Parsing.
Stephan Oepen, Omri Abend, Jan Hajic, Daniel Hershcovich, Marco Kuhlmann, Tim O’Gorman, Nianwen Xue, Jayeol Chun, Milan Straka, Zdenka Uresova, CoNLL 2019 (shared task).
[Paper: pdf] [Website: link] [UCCA data: link] [Code: github]

CoNLL 2020 MRP Shared Task

The task included parsing to AMR, UCCA, PTG, DRG, and EDS, in multiple languages. For UCCA, the languages were English and German.

MRP 2020: The Second Shared Task on Cross-Framework and Cross-Lingual Meaning Representation Parsing.
Stephan Oepen, Omri Abend, Lasha Abzianidze, Johan Bos, Jan Hajic, Daniel Hershcovich, Bin Li, Tim O’Gorman, Nianwen Xue and Daniel Zeman, CoNLL 2020 (shared task).
[Paper: pdf] [Website: link] [Data: link]

Publications

Semantics-aware Attention Improves Neural Machine Translation
Aviv Slobodkin, Leshem Choshen and Omri Abend.
[Paper: pdf]
Self-Attentive Constituency Parsing for UCCA-based Semantic Parsing
Necva Bölücü and Burcu Can.
[Paper: pdf]
Subcategorizing Adverbials in Universal Conceptual Cognitive Annotation.
Zhuxin Wang, Jakob Prange and Nathan Schneider. LAW-DMR 2021.
[Paper: pdf]
RepGraph: Visualising and Analysing Meaning Representation Graphs.
Jaron Cohen, Roy Cohen, Edan Toledo and Jan Buys. EMNLP 2021 demo.
[Paper: pdf] [Website]
Data-Driven Annotation of Textual Process Descriptions Based on Formal Meaning Representations.
Lars Ackermann, Julian Neuberger and Stefan Jablonski. Lecture Notes in Computer Science.
[Paper: pdf]
Great Service! Fine-grained Parsing of Implicit Arguments.
Ruixiang Cui and Daniel Hershcovich. IWPT 2021.
[Paper: pdf] [Code: github]]
Semantic Structural Decomposition for Neural Machine Translation.
Elior Sulem, Omri Abend and Ari Rappoport. *SEM 2020 (short paper).
[Paper: pdf] [Data & Code: github]
Refining Implicit Argument Annotation For UCCA.
Ruixiang Cui and Daniel Hershcovich. DMR 2020.
[Paper: pdf] [Data: github]
Comparison by Conversion: Reverse-Engineering UCCA from Syntax and Lexical Semantics.
Daniel Hershcovich, Nathan Schneider, Dotan Dvir, Jakob Prange, Miryam de Lhoneux and Omri Abend. COLING 2020.
[Paper: pdf] [Code: github]
Made for Each Other: Broad-Coverage Semantic Structures Meet Preposition Supersenses.
Jakob Prange, Nathan Schneider and Omri Abend. CoNLL 2019.
[Paper: pdf] [Code: github] [Data: github]
Semantically Constrained Multilayer Annotation: The Case of Coreference.
Jakob Prange, Nathan Schneider and Omri Abend. ACL 2019 Workshop on Designing Meaning Representations (DMR).
[Paper: pdf] [Data: github]
Preparing SNACS for Subjects and Objects.
Adi Shalev, Jena D. Hwang, Nathan Schneider, Vivek Srikumar, Omri Abend and Ari Rappoport. ACL 2019 Workshop on Designing Meaning Representations (DMR).
[Paper: pdf] [Data: github]
Content Differences in Syntactic and Semantic Representations.
Daniel Hershcovich, Omri Abend and Ari Rappoport. NAACL 2019 (long paper).
[Paper: pdf] [Supp. Material: pdf] [Code: github]
Multitask Parsing Across Semantic Representations.
Daniel Hershcovich, Omri Abend and Ari Rappoport. ACL 2018 (long paper).
[Paper: pdf] [Supp. Material: pdf] [Code: github]
Simple and Effective Text Simplification using Semantic and Neural Methods.
Elior Sulem, Omri Abend and Ari Rappoport. ACL 2018 (long paper).
[Paper: pdf] [Data: github]
Reference-less Measure of Faithfulness for Grammatical Error Correction.
Leshem Choshen and Omri Abend. NAACL 2018 (short paper).
[Paper: pdf] [Supp. Material: pdf] [Code: github]
Semantic Structural Evaluation for Text Simplification.
Elior Sulem, Omri Abend and Ari Rappoport. NAACL 2018 (long paper).
[Paper: pdf] [Data & Code: github]
A Transition-Based Directed Acyclic Graph Parser for UCCA.
Daniel Hershcovich, Omri Abend and Ari Rappoport. ACL 2017 (long paper). Outstanding Paper Award.
[Paper: pdf] [Supp. Material: pdf] [Code: github] [Demo]
UCCAApp: Web-application for Syntactic and Semantic Phrase-based Annotation.
Omri Abend, Shai Yerushalmi and Ari Rappoport. ACL 2017 (demo paper).
[Paper: pdf] [Code: github] [Demo]
The State of the Art in Semantic Representation.
Omri Abend and Ari Rappoport. ACL 2017 (long paper).
[Paper: pdf]
HUME: Human UCCA-Based Evaluation of Machine Translation.
Alexandra Birch, Omri Abend, Ondřej Bojar and Barry Haddow, EMNLP 2016 (long paper).
[Paper: pdf] [Data: github] [Demo]
Conceptual Annotations Preserve Structure Across Translations: A French-English Case Study.
Elior Sulem, Omri Abend and Ari Rappoport, ACL 2015 Workshop on Semantics-Driven Statistical Machine Translation (S2MT).
[Paper: pdf]
Universal Conceptual Cognitive Annotation (UCCA).
Omri Abend and Ari Rappoport, ACL 2013 (long paper).
[Paper: pdf]
UCCA: A Semantics-based Grammatical Annotation Scheme.
Omri Abend and Ari Rappoport, IWCS 2013 (long paper).
[Paper: pdf]

Theses

Refining and Parsing Implicit Arguments in UCCA.
Ruixiang Cui, MSc Thesis,
University of Copenhagen, 2020
[Paper: pdf]
Universal Semantic Parsing with Neural Networks.
Daniel Hershcovich, PhD Thesis,
The Hebrew University of Jerusalem, 2019
[Paper: pdf]
Measuring Semantic Preservation in Machine Translation with HCOMET: Human Cognitive Metric for Evaluating Translation.
Pedro Marinotti, MSc Thesis,
The University of Edinburgh, 2014
[Paper: pdf]
Integration of a cognitive annotation into machine translation: Theoretical foundations and bilingual corpus analysis.
Elior Sulem, MSc Thesis,
The Hebrew University of Jerusalem, 2014
[Paper: pdf]
Semi-supervised identification of scene-evoking nouns in UCCA.
Amit Beka, MSc Thesis,
The Hebrew University of Jerusalem, 2013
[Paper: pdf]
Grammatical Annotation Founded on Semantics: A Cognitive Linguistics Approach to Grammatical Corpus Annotation.
Omri Abend, PhD Thesis,
The Hebrew University of Jerusalem, 2013
[Paper: pdf]

Reports

Distinguishing Human Translations and Machine Outputs with UCCA.
Michal Kessler, Lab Report,
The Hebrew University of Jerusalem, 2019
[Paper: pdf]

Contact

For any questions or feedback, please email Omri Abend at oabend@cs.huji.ac.il.