This repository contains the sample data for the Programming Historian's lesson Detecting Text Reuse with Passim, written by Matteo Romanello and Simon Hengchen (currently in preparation).
Data come from two different sources (see respective READMEs for license statements and further details):
- books from EEBO (Early English Books Online) → more info
- newspaper articles from impresso → more info
The Jupyter notebook explore-passim-output.ipynb
contains an example of how to load passim
's JSON output into a pandas
DataFrame
to compute some statistics.
To run the notebook as well as the script eebo/code/main.py
make sure that you install the required dependencies into a new virtual environment (created by using conda
, pyenv
, venv
, etc.):
pip install -r requirements.txt