This repository contains sample data, both sample input for coleto and sample output from coleto.
When first using coleto, it is recommended to clone or download this repository and first run analyses on this sample data to make sure coleto works as intended, before adding one's own data to the coleto-data folder in a way analogous to the sample data.
Note that although sample data is provided in several languages, coleto does not currently perform any language-specific operations on the data. In particular, sentence splitting may be improved in the future.
The Doyle1 sample dataset for coleto consists of a very short passage from Arthur Conan Doyle's short story "The Adventure of the Mazarin Stone". Version 1 is the original, whereas version2 contains an intentionally altered version of the text for testing purposes. This dataset is primarily intended for a simple test of whether or not coleto runs at all and does not cover all possible kinds of textual phenomena.
This sample dataset is based on a novel by French author Loaisel de Tréogate, which was published under two different titles, and with some textual differences, in 1779 and 1793. The texts are in the public domain and have been taken from the "roman18" collection prepared by the MiMoText project: https://github.com/MiMoText/roman18.
This sample dataset is based on a brief piece of narrative fiction, "Der letzte Brief eines Literaten", by Arthur Schnitzler, in German.
Source: TextGrid Repository (2012). Schnitzler, Arthur. Erzählungen. Der letzte Brief eines Literaten. Der letzte Brief eines Literaten. Digitale Bibliothek. TextGrid. https://hdl.handle.net/11858/00-1734-0000-0004-D9AB-D