Skip to content
This repository was archived by the owner on Jan 6, 2024. It is now read-only.

Commit 8000237

Browse files
authored
Updates for release v1.8.0
1 parent b5e8a09 commit 8000237

File tree

1 file changed

+10
-7
lines changed

1 file changed

+10
-7
lines changed

README.md

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
[![Build Status](https://travis-ci.org/deajan/pmOCR.svg?branch=master)](https://travis-ci.org/deajan/pmOCR) [![License](https://img.shields.io/badge/License-BSD%203--Clause-blue.svg)](https://opensource.org/licenses/BSD-3-Clause) [![GitHub Release](https://img.shields.io/github/release/deajan/pmOCR.svg?label=Latest)](https://github.com/deajan/pmOCR/releases/latest)
44

55

6-
A multicore batch & service wrapper script for Tesseract 3.02+ / 4.1.2 / 5.0.0 (https://github.com/tesseract-ocr/) or ABBYY CLI OCR 11 FOR LINUX based on Finereader Engine 11 optical character recognition (www.ocr4linux.com).
6+
A multicore batch & service wrapper script for Tesseract v3/v4/v5 (https://github.com/tesseract-ocr/) or ABBYY CLI OCR 11 FOR LINUX based on Finereader Engine 11 optical character recognition (www.ocr4linux.com).
77

88
Conversions support tiff/jpg/png/pdf/bmp to PDF, TXT and CSV (also DOCX and XSLX for Abbyy OCR). It can actually support any other format that your OCR engine can handle.
99

@@ -12,21 +12,24 @@ This wrapper can work both in batch and service mode.
1212
In batch mode, it's used as commandline tool for processing multiple files at once, being able to output one or more formats.
1313

1414
In service mode, it will monitor directories and launch OCR conversions as soon as new files get into the directories.
15+
Since v1.8.0, it can also monitor NFS / SMB mountpoints with new integrated inotifywait emulation poller.
1516

1617
pmOCR has the following options:
1718
- Include current date into the output filename
1819
- Ignore already OCRed PDF files based on font detection and / or file suffix
19-
- Delete or rename input file after successful conversion
20+
- Delete or move input file after successful conversion
2021

2122
## Install it
2223

2324
$ git clone https://github.com/deajan/pmOCR
2425
$ cd pmOCR
2526
$ ./install.sh
2627

27-
You will also need inotifywait (from inotify-tools package) and pdffonts (from poppler-utils package).
28+
You will need pdffonts util (from poppler-utils package).
29+
Optionally, you can install inotifywait (from inotify-tools package).
30+
2831
If you are using tesseract OCR, please install tesseract-osd and tesseract-[your language] (sometimes called tesseract-ocr-osd).
29-
You will also need ghostscript in order to be able to transform bitmap PDF documents to indexed PDFs.
32+
You will also need ImageMagick in order to be able to transform bitmap PDF documents to indexed PDFs.
3033

3134
## Batch mode
3235

@@ -87,7 +90,8 @@ With systemD, you have to launch a service for each config file. Example for con
8790
Has been tested so far with:
8891
- ABBYY FineReader OCR Engine 11 CLI for Linux releases R2 (v 11.1.6.562411), R3 (v 11.1.9.622165) and R6 (v 11.1.14.707470)
8992
- Tesseract-ocr 3.0.4
90-
- Tesseract-ocr 4.0.0
93+
- Tesseract-ocr 4.0.0 and 4.0.12
94+
- Tesseract-ocr 5.0.0 and 5.0.1
9195

9296
Tesseract mode also uses ghostscript to convert PDF files to an intermediary TIFF format in order to process them.
9397

@@ -99,8 +103,7 @@ Parameters include any arguments to pass to the OCR program depending on the tar
99103

100104
ABBYY has in integrated preprocessor in order to enhance recognition qualitiy whereas Tesseract relies on external tools.
101105
pmOCR can use a preprocessor like ImageMagick to deskew / clear noise / render white background and remove black borders.
102-
ImageMagick preprocessor is configured, but disabled by default.
103-
In order to use it with Tesseract, you have to uncomment it in your configuration file.
106+
ImageMagick preprocessor is configured, and enabled by default to be used with Tesseract.
104107

105108
## Tesseract caveats
106109

0 commit comments

Comments
 (0)