Skip to content
This repository has been archived by the owner on Dec 16, 2023. It is now read-only.
/ simple-ocr Public archive

A convenient way of reading PDF's and Images using Tesseract

Notifications You must be signed in to change notification settings

skcript/simple-ocr

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Simple-OCR

Simple-OCR provides a more convenient way of reading PDF's and Images using the Tessaract Engine.

Installation Instructions

  1. Install Tesseract.
  2. Install ImageMagick.

Example Usage

It's very simple to use Simple-OCR:

# Specify the path of your source image or PDF.
img = OCR::Image.new("source.png")

# Specify the output file name, called "destination" here.
img.scan("destination", "-l eng", :pdf)

You can also give custom command line options.

img.scan("destination", "-l eng -psm 1...", :pdf)

It is also possible to specify the output file type, which can either be:

  • pdf
  • txt
  • hocr
img.scan("destination", "-l eng", :txt)
img.scan("destination", "-l eng", :hocr)

About

Skcript

SimpleOCR is maintained and funded by Skcript. The names and logos for Skcript are properties of Skcript.

We love open source, and we have been doing quite a bit of contributions to the community. Take a look at them here. Also, encourage people around us to get involved in community operations. Join us, if you'd like to see the world change from our HQ.

About

A convenient way of reading PDF's and Images using Tesseract

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •