This repository contains code and sample data related to running the impresso corpus through the text reuse detection software passim.
-
Updated
Oct 8, 2024 - Jupyter Notebook
This repository contains code and sample data related to running the impresso corpus through the text reuse detection software passim.
The Hongkong News headline analysis project was conducted by the Chinese University of Hong Kong Library.
Everything to reproduce the CLEF HIPE 2020 campaign results.
Dataset from the paper "Information Extraction from Public Meeting Articles"
Repository of JSON schemas used in the Impresso project.
Tools for the use of Tesseract OCR in R
Awesome historical newspaper analysis tools and literature
Python library to import OCR data in various formats into the canonical JSON format defined by the Impresso project.
Convert ALTO XML to plain text + minimal metadata
The GeoNewsMiner (GNM): An interactive spatial humanities tool to visualize geographical references in historical newspapers
Add a description, image, and links to the historical-newspapers topic page so that developers can more easily learn about it.
To associate your repository with the historical-newspapers topic, visit your repo's landing page and select "manage topics."