bug: TesseractOcrModel
is sensitive to document orientation
#1155
Labels
bug
Something isn't working
TesseractOcrModel
is sensitive to document orientation
#1155
Bug
When running
DocumentConverter
withTesseractOcrOptions
I noticed that document which were not correctly oriented were not correctly processed.Actually I observed the same behavior with EasyOCR but not with MacOS OCR.
However when using tesseract we can detect the page orientation using
self.osd_reader.DetectOrientationScript()
, rotate it and then perform OCR, which would improve recognition performance.I will try to propose a fix soon.
Steps to reproduce
correct_orientation.pdf
:correct_orientation.pdf
wrong_orientation.pdf
:wrong_orientation.pdf
outputs:
Expected output:
Docling version
Docling version: 2.26.0
Docling Core version: 2.21.2
Docling IBM Models version: 3.4.1
Docling Parse version: 3.4.0
Python: cpython-311 (3.11.9)
Platform: macOS-14.6-arm64-arm-64bit
Python version
Python 3.11.9
The text was updated successfully, but these errors were encountered: