-
Notifications
You must be signed in to change notification settings - Fork 0
homepage_faq
newspapers as historical source + lit. references for CH and LU newspapers
In a nutshell, the impresso corpus contains the historical newspaper collections of the Swiss National Library, the National Library of Luxembourg, the Neue Zürcher Zeitung, Le Temps, the Valais State Archives and the Swiss Economic Archives. We recommend that you take a closer look at our overview of newspapers.
For legal reasons we can only show a subset of the newspapers. To gain access to the whole collection, you need to sign a Non-Disclosure-Agreement (NDA) which is available for download here. We will provide you with a user account once we received the signed NDA back from you.
impresso users can download text and metadata for a maximum of 10.000 articles in form of a .csv file to allow - for example - further processing topic modeling on personally curated corpora. For advanced users we provide access via an API. If this is of interest to you, please contact us at info@impresso-project.ch.
Named entities are defined entities, that means identifiable persons, institutions, locations. The important criteria is here the name: to differentiate for instance a common noun such as “pope” from the mention of a particular named entity such as “Pope Francis”. The automated recognition of named entities (NER) works very well for born-digital texts but poses challenges when applied to historical, often imperfect text. NER automatically detects mentions of e.g. a person in a text. In a second step we try to link it to a large database of already identified entities. This allows us to link one mention of a person named “Winston Churchill” to the former British prime minister across the corpus. The improved automated recognition of named entities in historical texts is one of impresso’s research objectives.
We use state-of-the-art tools to improve the quality of the OCR and to identify persons, locations and institutions. Inevitably, they fail sometimes and make mistakes which we need to remain aware of. But we believe that despite these imperfections, the opportunities offered by the automated enrichment of historical texts by far outweighs these downsides.
The impresso interface remains under active development and we will add new features to the interface in the common months. We always look forward to hearing from you and to learn how you made use of impresso’s tools. To leave us feedback, please click on the black envelope on the lower right of the interface. We will get back to you soon after.
The impresso project is a Swiss-Luxembourgish research project dedicated to the computational enrichment of historical newspapers and the development of new workflows for (digital) historians. The core team consists of computational linguists, designers/developers as well as historians based at the DHLAB of the École polytechnique fédérale de Lausanne (EPFL), the Institute of Computational Linguistics at the University of Zurich and the Luxembourg Centre for Contemporary and Digital History (C2DH). The project is funded by the Swiss National Science foundation (Grant CR- SII5_173719). Take a look at our project homepage for more details.