Improved Diplopia Solution #4211

woodjohndavid · 2024-03-13T20:30:50Z

I previously created a pull request from my branch JDWDIPLOPIA. This solution, while better than without it as far as diplopia is concerned, was a limited solution.

I am now creating a pull request from my branch JDWDIPLOPIA2. I believe that this is a more complete solution to the diplopia issue.

stweil · 2024-03-13T21:49:45Z

I don't see a commit which changes the Tesseract code. Is something missing?

woodjohndavid · 2024-03-13T22:20:56Z

OK thanks @stweil

This may leave you somewhat concerned about my code changes, but I can assure you that I am a competent developer. I just never use GitHub other than with Tesseract, hence incompetent in that regard.

Note by the way, similar to the other pull request I generated, there are some new configuration values that can only be set in code as it stands, but should be made into available settings. I have not yet figured out the mechanism for doing that. If the diplopia changes I am proposing turn out to be useful, hopefully someone else familiar with the settings approach could take care of that. These configuration values are:

bool kRemoveDiplopia - if true, enables diplopia removal functionality. If false, my changes have no effect
int kMaxDiplopiaGap - number of timesteps apart to be considered diplopia, default 2

amitdo · 2024-03-14T09:23:32Z

Apart from testing that this patch has a positive effect on the diplopia issue, people should test if there is no negative effect in other places, like dropping of correct characters.

src/lstm/recodebeam.cpp

woodjohndavid · 2024-03-20T22:32:18Z

Please note that this change is likely not appropriate for those using Tesseract for natural language recognition using relevant dawgs. It is primarily intended for those (like myself) using Tesseract to scan technical data, looking for exact character by character recognition.

Papucs · 2024-09-12T06:16:57Z

Hi,
Is there any chance this will be merged in the near future?

amitdo · 2024-09-12T10:21:40Z

This PR adds 425 lines to the neural network's code.

According to the PR author, his solution to the 'diplopia' issue is not a generic solution. It tries to solve a specific use case.

So the question is, is it worth it? Do we want this in Tesseract?

Another question, is this Engilsh only solution? Will it work well with all other Latin based languages? What about other scripts like Cyrillic, Indic scripts, Hangul (Korrean), Chinese and Japanese scripts?

If we decide we do want this solution, the code still lacks a config variable. It should be boolean, set to 'False' by default.

stweil · 2024-09-12T10:29:49Z

We know that better models reduce the number of diplopia cases. That should work for all languages and scripts and is my preferred solution for the problem, instead of very special solutions in the software with unknown side effects.

woodjohndavid · 2024-09-14T21:47:05Z

Hi all:

Just a few comments from myself, the author of this diplopia fix:

As mentioned earlier, this change is intended primarily for those who are looking for a character-by-character OCR solution, not so much textual word recognition relying on the dawg functionality.
In my testing on my limited set of data, it seems to work well, pretty much entirely eliminating diplopia. However, it needs much more testing on broader sample sets by others.
I see no reason at all that this will not work for any other non-Latin character set. Of course this would need to be tested, but I don't believe that there is anything in the logic that is character set specific.
Indeed it needs a user configurable variable to turn this function on and off. I have no experience with the config functionality, and it seems pretty arcane in the brief time I have spent looking at that code. Surely there are available developers with detailed knowledge of that code that can add this config variable easily. There is already a boolean on/off variable in the fix code, just no way for a user to set it.

For my own purposes, I don't need this fix incorporated into the main Tesseract code, as I am quite happy running my own custom copy. However, I have taken the time to promote it in this pull request to try to make a modest but hopefully meaningful contribution to Tesseract. If it is not going to see the light of day for others, that is unfortunate.

Regards,

Dave

Update test

5ffffeb

Improved Diplopia Solution

c9b7f18

This was referenced Mar 13, 2024

Character confusion fix suggestion #3144

Open

Duplicate Characters in Output Stream #2738

Open

Tesseract inserting additional alternative characters #1465

Open

recognizes more characters than present #1362

Open

stweil requested changes Mar 14, 2024

View reviewed changes

src/lstm/recodebeam.cpp Show resolved Hide resolved

src/lstm/recodebeam.cpp Outdated Show resolved Hide resolved

Style Updates

65eebb2

tfmorris mentioned this pull request Aug 11, 2024

Fix for LSTM Diplopia issue #3476

Open

amitdo added the diplopia label Sep 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved Diplopia Solution #4211

Improved Diplopia Solution #4211

woodjohndavid commented Mar 13, 2024

stweil commented Mar 13, 2024

woodjohndavid commented Mar 13, 2024

amitdo commented Mar 14, 2024 •

edited

Loading

woodjohndavid commented Mar 20, 2024

Papucs commented Sep 12, 2024

amitdo commented Sep 12, 2024 •

edited

Loading

stweil commented Sep 12, 2024

woodjohndavid commented Sep 14, 2024

Improved Diplopia Solution #4211

Are you sure you want to change the base?

Improved Diplopia Solution #4211

Conversation

woodjohndavid commented Mar 13, 2024

stweil commented Mar 13, 2024

woodjohndavid commented Mar 13, 2024

amitdo commented Mar 14, 2024 • edited Loading

woodjohndavid commented Mar 20, 2024

Papucs commented Sep 12, 2024

amitdo commented Sep 12, 2024 • edited Loading

stweil commented Sep 12, 2024

woodjohndavid commented Sep 14, 2024

amitdo commented Mar 14, 2024 •

edited

Loading

amitdo commented Sep 12, 2024 •

edited

Loading