Skip to content

Latest commit

 

History

History
275 lines (186 loc) · 7.14 KB

README.md

File metadata and controls

275 lines (186 loc) · 7.14 KB

OCR comparison tool

Welcome to this repository which provides you with a complete-standalone, automated tool to compare the performance of different OCR services.

This tool was developed to compare the performance of Amazon's, Google's and Microsoft's text detection in a variety of images from: handdrawn characters and words to 'live scene' photographs. The blog can be found here

Disclaimer

This project was developed using:

  • python 3.7.4
  • python modules version as described in requirements.txt
  • CharacTER.py released on 27/06/2019

Software versions are subject to change with new releases, to ensure the project runs smoothly without alteration the above versions should be used. This software was last ran on 14/10/2019

Introduction

This tool is fully automated to generate images' transcriptions from disk, pass them, one-by-one, into each supported OCR service and generate meaningful metrics.

This tool makes use of the command-line interface (CLI) to operate.

The tool currrently supports the following OCR services:

Amazon

  • Textract is used for detecting document text
  • Rekognition is used for detecting live scene text

Google

  • Vision is used for detecting document and live scene text

Microsoft


Getting Started

Following the instructions below will enable you to use the tool for comparing your own images.

Prerequisites

The following need to be setup before using this tool.

Amazon

  1. Follow the steps in this guide to create an account and setup a user
  2. Follow steps 2-4 in this guide to generate your account's key

Google

  1. Follow the steps in this guide to create a billing activated account
  2. Follow the steps in this guide to enabled Google Vision for a Google Cloud Project

Microsoft

  1. Follow the steps in this guide to create an account and link a congitive service resource to it
  2. Create a secret file, e.g.

vi /path/to/directory/.ms/credentials.txt

  1. Follow the step 'Get the keys from you resource' in this guide and store this in the secret file (replace the placeholder key value with your account's key)

{
    "key": "XXXXXXXXXXXXXXX00XXX"
}


Optional: It is recommended that you store your service/access keys in a secret '.' file.


mv /path/to/saved/credentials.txt /path/to/file/.secret_file.txt

You will need the pathways to these keys in future steps


Installation

To install this tool to your local machine for comparison purposes, follow the instructions below.

  1. Clone this repo to your local machine

git clone <HTTPS URL>/ocr_comparison_tool.git

  1. Move into the ocr_comparison_tool directory

cd /path/to/cloned/directory/ocr_comparison_tool/

      2.5. Optional: Create a python3 virtual environment


python3 -m venv .

      then


. bin/activate

  1. Install the required python libraries

pip3 install -r requirements.txt


Configuration

To configure the OCR services for this tool, follow the steps below.

Amazon

In ./ocr_settings/amazon_settings.py change the placeholder paths to your specific secret files:


environ['AWS_SHARED_CREDENTIALS_FILE']='/path/to/your/secret/credential/.file.txt'
environ['AWS_CONFIG_FILE']='/path/to/your/secret/config/.file.txt'


Google

In ./ocr_settings/google_settings.py change the placeholder path to your specific secret file:


environ['GOOGLE_APPLICATION_CREDENTIALS']='/path/to/your/secret/credential/.file.json'


Microsoft

In ./ocr_settings/microsoft_settings.py change the placeholder path to your specific secret file:


MICROSOFT_ACCESS_CREDENTIALS='/path/to/your/secret/credential/.file.json'


CharacTER

In ./ocr_settings/gateway_settings.py change the placeholder path to your specific CharacTER.py file:


environ['CHARACTER_SCRIPT_PATH']='/path/to/script/CharacTER.py'


Operation

Constraints
  • images must .jpg or .png format
  • images must be at least 50 x 50pxl


This tool supports a variety of ways to process images and their transcripts:


Run using directory


python3 /path/to/ocr_comparison_tool/cmd.py --dir /path/to/entry_dir


Note: For this option, entry_dir must adhere to the following structure:


entry_dir 
├── props.csv       # properties for images
├── ogl/            # original transcripts*
├── res/            # apis' transcripts*
├── met/            # CharacTER metric scores*
└── imgs/           # images to be transcribed
    ├── img1.jpg
    ├── img2.png
    ├──    .
    ├──    .
    ├──    .
    └── imgn.jpg

Directories: ogl, res and met are optional* as they are generated by the tool.


Run using single image (transcript auto-generated)


python3 /path/to/ocr_comparison_tool/cmd.py --img /path/to/image.jpg


Note: This command auto-generates the original transcript and so assumes that props.csv is located within the current working directory


Run using single image (transcript provided)


python3 /path/to/ocr_comparison_tool/cmd.py --ogl /path/to/transcript.txt --img /path/to/image.jpg


Define images' properties filename


python3 /path/to/ocr_comparison_tool/cmd.py --prp properties.csv


Note: The property file must be located in the current working directory


Define type of OCR transcript


python3 /path/to/ocr_comparison_tool/cmd.py --med [image/document/both]


Note: Changing the media invokes only the models for that type


Authors

Acknowledgments