https://doi.org/10.5281/zenodo.2560356
This grammar is an adaptation to the medical domain of the cross-domain grammar available in HeidelTime [1] (https://github.com/HeidelTime/heideltime), a multilingual, domain-sensitive temporal tagger developed at the Database Systems Research Group at Heidelberg University. HeidelTime extracts temporal expressions from documents and normalizes them according to the TIMEX3 annotation standard. It is available as UIMA annotator and as standalone version.
This adapted grammar can be used to detect temporal expressions in real clinical data in Spanish.
This software requires HeidelTime installed on your system. The grammar has been tested with version 2.2.1 and it should work with more recent versions. HeidelTime is licensed under the GNU General Public License (Version 3). You can download it from the following website: https://github.com/HeidelTime/heideltime
Please, note that HeidelTime is an UIMA [2] component. It requires UIMA and sentence, token, and part-of speech annotations (HeidelTime includes a wrapper of TreeTagger [3] for UIMA).
normalization/ This folder contains the files for normalized expressions. These files contain normalized values of expressions included in the repattern folder. They correspond to the ISO format for temporal information. repattern/ This folder contains the files for expression patterns. Patterns are used to create regular expressions, which can be accessed by every rule. This allows to use category names (e.g., "month") instead of listing all items every time the category is needed in a rule. rules/ This folder contains the rules to identify and normalize temporal expressions.
Install and test your HeidelTime installation following the instructions given by HeidelTime developers. Copy the "spanish_ehr/" directory into the "resources/" directory of HeidelTime.
Run UIMA Collection Processing Engine:
$ cpeGui.sh
Create a workflow with the following components:
Collection reader: - UIMA's file system collection reader: $UIMA_HOME/examples/descriptors/collection_reader/FileSystemCollectionReader.xml set "Input directory" to $HEIDELTIME_HOME/doc/ Analysis Engines - TreeTaggerWrapper located at HEIDELTIME_HOME/desc/annotator/TreeTaggerWrapper.xml set "Language" to "english" set "Annotate_tokens" to "true" set "Annotate_partofspeech" to "true" set "Annotate_sentences" to "true" set "Improvegermansentences" to "false" - HeidelTime located at HEIDELTIME_HOME/desc/annotator/HeidelTime.xml set "Date" to "true" set "Time" to "true" set "Duration" to "true" set "Set" to "true" set "Temponym" to "false" set "Language" to "english" set "Type" to "narratives" CAS Consumer - UIMA's XMI Writer CAS Consumer located at $UIMA_HOME/examples/descriptors/cas_consumer/XmiWriterCasConsumer.xml set "Output Directory" to OUTPUT
Save the workflow, set "spanish_ehr" as "language" and run the workflow.
Aitor Gonzalez-Agirre (aitor.gonzalez@bsc.es)
See LICENSE file.
Copyright (c) 2017-2018 Secretaría de Estado para el Avance Digital (SEAD)
[1] Strötgen, Gertz: HeidelTime: High Qualitiy Rule-based Extraction and Normalization of Temporal Expressions. SemEval'10. PDF Bibtex
[2] UIMA: https://uima.apache.org/
[3] TreeTagger: http://www.cis.uni-muenchen.de/~schmid/tools/TreeTagger/