Skip to content

Commit

Permalink
Improved character position tracking when LSTM models are used
Browse files Browse the repository at this point in the history
When using LSTM models the accuracy of character bounding boxes is low
with many blobs assigned to wrong characters. This is caused by the fact
that LSTM model output produces only approximate character positions
without boundary data. As a result the input blobs cannot be accurately
mapped to characters and which compromises the accuracy of character
bounding boxes.

Current this problem is solved as follows. The character boundaries are
computed according to the character positions from the LSTM output by
placing the boundaries at the middle between two character positions.
The blobs are then assigned according to which character the center of
the blob falls to. In other words the blobs are assigned to the nearest
characters.

This unfortunately produces a lot of errors because the character
positions in the LSTM output have a tendency to drift, thus the nearest
character is often not the right one.

Fortunately while the LSTM model produces approximate positions, the
blob boundaries produced by the regular segmenter are pretty good. Most
of the time a single blob corresponds to a single character and
vice-versa.

The above is used to create an optimization algorithm that treats the
output of the regular segmenter as a template to which LSTM model output
is matched. The selection of best match is done by assigning each
unwanted property of the outcome a cost and then minimizing the total
cost of the solution.

This reliably solves the most frequent error present in the current
solution when blobs are simply assigned to wrong character. As a result
the current algorithm produces up to 20 times less errors.

Fixes tesseract-ocr#1712.
  • Loading branch information
p12tic committed Apr 10, 2022
1 parent dbb2adb commit cbe83ec
Show file tree
Hide file tree
Showing 4 changed files with 842 additions and 45 deletions.
2 changes: 2 additions & 0 deletions Makefile.am
Original file line number Diff line number Diff line change
Expand Up @@ -250,6 +250,7 @@ endif
# Rules for src/ccstruct.

noinst_HEADERS += src/ccstruct/blamer.h
noinst_HEADERS += src/ccstruct/blob_bounds_calculator.h
noinst_HEADERS += src/ccstruct/blobbox.h
noinst_HEADERS += src/ccstruct/blobs.h
noinst_HEADERS += src/ccstruct/blread.h
Expand Down Expand Up @@ -293,6 +294,7 @@ noinst_HEADERS += src/ccstruct/params_training_featdef.h
endif

libtesseract_la_SOURCES += src/ccstruct/blamer.cpp
libtesseract_la_SOURCES += src/ccstruct/blob_bounds_calculator.cpp
libtesseract_la_SOURCES += src/ccstruct/blobbox.cpp
libtesseract_la_SOURCES += src/ccstruct/blobs.cpp
libtesseract_la_SOURCES += src/ccstruct/blread.cpp
Expand Down
Loading

0 comments on commit cbe83ec

Please sign in to comment.