Readme-IKAT-EN.txt

﻿Implicit Knowledge in Argumentative Texts: An Annotated Corpus (IKAT-EN)LREC 2020Maria Becker, Katharina Korfhage, Anette FrankShort project description/Abstract. When speaking or writing, people omit information that seems clear and evident, such that only part of the message is expressed in words.  Especially in argumentative texts it is very common that (important) parts of the argument are implied and omitted. We hypothesize that for argument analysis it will be beneficial to reconstruct this implied information.  As a starting point for filling such knowledge gaps, we build a corpus consisting of high-quality human annotations  of missing and implied information in argumentative texts. To learn more about the characteristics of both the argumentative texts and the added information, we further annotate the data with semantic clause types and commonsense knowledge relations. The outcome of our work is a carefully designed and richly annotated dataset, the IKAT-EN corpus.An in-depth analysis of characteristic distributions and correlations of the assigned labels can be found in our paper: Becker, M., Korfhage, K., and Frank, A. (2020): Implicit Knowledge in Argumentative Texts: An Annotated Corpus. Proceedings of LREC. Marseille, France.Please cite this paper when using IKAT-EN.ANNOTATION OF IMPLICIT KNOWLEDGE.The annotations are performed on sentence pairs from the Microtext corpus (the English version, Peldszus/Stede 2015). Each microtext is a short, dense argument consisting of roughly five elementary units of argumentation, so called argumentative units (Peldszus and Stede, 2015). We presented all sentence pairs to our annotators that are either adjacent or stand in an argumentative relation according to the argumentation graph. There are 719 such sentence pairs in the 112 texts in the corpus. We asked our annotators to detect whether the connection between the pair of units is made fully explicit by the text, and if this is not the case, to explain the missing connection by providing one or more sentences that make this connection explicit. Our annotators were supposed to add as few sentences as possible and to make these sentences very simple (if possible one fact per sentence) in order to retrieve the minimal amount of information that is needed to connect the two units and to avoid too detailed explanations.We trained two expert annotators with a linguistic background who produced two versions of the implicit knowledge, which then served as the basis for the final gold standard produced by another expert annotator (one of the authors), which we release here.  ANNOTATION OF SEMANTIC CLAUSE TYPES. We asked the annotators to characterize the argumentative units and the inserted sentences by labeling them with semantic clause types (Smith 2003, Friedrich et al. 2016):- states describe specific properties of individuals- events are things that happen or have happened- generic sentences are predicates over classes or kinds- generalizing sentences describe regularly occurring events or habitsThe annotations are performed independently by two trained annotators and a third expert annotator who assigned the GOLD label. ANNOTATION OF CONCEPTNET RELATIONS. To gain further insight into the type of knowledge covered by the argumentative texts and the annotations of implicit knowledge, we annotate both with ConceptNet relation types such as PartOf, Causes or IsA. The annotation was performed by two annotators in parallel and a third expert annotator who assigned the GOLD label. We provide one tsv-file per microtext which is structures as followed:The first lines display the argumentative units (e-1 to e-n), one line per unit. They are annotated with semantic clause types and ConceptNet relations (GOLD version), displayed in the same line, separated by tabs. If the argumentative unit contains more than one clause, all clauses are annotated with semantic clause types and separated with a forward slash (e.g. STATE/GENERIC). The ConceptNet relations are annotated in the format concept 1, concept 2 (relation) (e.g. dog owners, negligent (HasProperty)). The annotators were also allowed to add more than one ConceptNet relation that is applicable for the argumentative unit, again separated by a forward slash. The following lines contain the implicit knowledge annotations for the pairs of argumentative units. The first column list the argument pair (e.g. e1-e3), the second columns gives information about the adjacency of the pair, followed by its argumentative relation in the third column and the number of inserted sentences in column four. Next, the inserted sentences are displayed together with their semantic clause types and ConceptNet relations, annotated as described above. We release the annotation manual together with our corpus. The annotation process is described in more detail in Becker et al. 2020.References:Becker, M., Staniek, M., Nastase, V., Frank, A. (2017): Enriching Argumentative Texts with Implicit Knowledge. International Conference on Natural Language & Information Systems (NLDB).Annemarie Friedrich, Alexis Palmer, and Manfred Pinkal. 2016. Situation entity types: automatic classification of clause-level aspect. In Proceedings of ACL 2016.Andreas Peldszus and Manfred Stede. 2015. An an- notated corpus of argumentative microtexts. In Proceedings of the First European Conference on Argumentation: Argumentation and Reasoned Action.Carlota S Smith. 2003. Modes of discourse: The local structure of texts, volume 103. Cambridge University Press.