Skip to content

Latest commit

 

History

History
39 lines (23 loc) · 1.89 KB

README.md

File metadata and controls

39 lines (23 loc) · 1.89 KB

OCR-tesseract-SE

OCR program based on Pytesseract - a wrapper for Tesseract. It includes language models to enhance the OCR performance.

Getting started

  • Install Tesseract

    • For Mac users: brew install tesseract
    • For Windows users: The latest installer can be downloaded from here.
    • For Linux users: sudo apt install tesseract-ocr -y
  • Add tesseract path to system environment variable

  • Download language models here.

  • Google colab notebook

Usage

For non-technical users

If you are from non-technical background, and would like to set up pytesseract on your computer from scratch, please refer to instructions here: Mac, Windows. The guide also includes instructions to set up python and virtual environment.

Acknowledgements

Contact

Ekta Vats (ekta.vats@abm.uu.se)
Centre for Digital Humanities
Uppsala University
Sweden