Skip to content
This repository has been archived by the owner on Sep 26, 2023. It is now read-only.

Latest commit

 

History

History
22 lines (14 loc) · 913 Bytes

README.md

File metadata and controls

22 lines (14 loc) · 913 Bytes

This is no longer supported, please use https://github.com/cneud/alto-tools instead.

alto-ocr-confidence

Calculates the OCR confidence score per page in ALTO files.

The method used is really simple:

  • find all String elements
  • get value of attribute "(WC)" (word confidence) for each String
  • calculate sum of all "WC" values
  • divide sum by the count of words per page

Use like:

python alto_ocr_confidence.py <inputdir>

Example output:

File: alto\AZ_1926_04_25_0001.xml, Confidence: 54.13

Note that OCR confidence (which is a native output of the OCR engine) is NOT equal to the actual OCR accuracy, which can only be determined by evaluation against Ground Truth.

Read more about OCR evaluation here.