Skip to content

Latest commit

 

History

History
22 lines (10 loc) · 647 Bytes

image_ocr_pdftotext.md

File metadata and controls

22 lines (10 loc) · 647 Bytes

Tesseract

https://github.com/madmaze/pytesseract

https://askubuntu.com/questions/943980/how-can-i-get-tesseract-ocr-to-recognise-the-large-digits-of-an-electricity-mete

Meduium - https://medium.freecodecamp.org/getting-started-with-tesseract-part-i-2a6a6b1cf75e

Unix utilities for pdf to text | ocr in pdf |

https://unix.stackexchange.com/questions/377359/how-to-use-ocr-from-the-command-line-in-linux

$ sudo apt install tesseract-ocr $ imageMagick = Format conversion, Transformation, Special effects, Text & comments of images $ ocrmypdf = pictures.pdf --> scanned.pdf $ pdftotext = scanned.pdf --> scanned.txt