You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I take some photos of documents with my phone. I download them. They are properly rotated. I cd into their dir and run pdf2searchablepdf ., which produces file ._searchable.pdf.
The PDF pages are improperly rotated though!
Double-clicking an image in Ubuntu to open it in the Ubuntu Image Viewer shows it is rotated properly, so what's wrong!
Well, it turns out the image contains "Exif orientation metadata" which tesseract is apparently ignoring! Open the image in GIMP and it will show the following:
This image contains Exif orientation metadata. Would you like to rotate the image?
So:
Report this as a bug to tesseract.
Do a fix meanwhile which will force a true rotation prior to running tesseract:
sudo apt install exiftran
cd path/to/dir_of_images
exiftran -ai *.jpg
Scenario:
I take some photos of documents with my phone. I download them. They are properly rotated. I cd into their dir and run
pdf2searchablepdf .
, which produces file._searchable.pdf
.The PDF pages are improperly rotated though!
Double-clicking an image in Ubuntu to open it in the Ubuntu Image Viewer shows it is rotated properly, so what's wrong!
Well, it turns out the image contains "Exif orientation metadata" which tesseract is apparently ignoring! Open the image in GIMP and it will show the following:
So:
tesseract
.I should also auto-enhance (whiten) the images with these 2 algorithms in Python in my answer here: https://stackoverflow.com/questions/48268068/how-do-i-do-the-equivalent-of-gimps-colors-auto-white-balance-in-python-fu/67343271#67343271. See also: https://superuser.com/questions/370920/auto-image-enhance-for-ubuntu.
And I should compress them with
jpegoptim
as I explain in my readme here: https://github.com/ElectricRCAircraftGuy/PDF2SearchablePDF#image-size-notes.The text was updated successfully, but these errors were encountered: