Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved Diplopia Solution #4211

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

woodjohndavid
Copy link

I previously created a pull request from my branch JDWDIPLOPIA. This solution, while better than without it as far as diplopia is concerned, was a limited solution.

I am now creating a pull request from my branch JDWDIPLOPIA2. I believe that this is a more complete solution to the diplopia issue.

@stweil
Copy link
Member

stweil commented Mar 13, 2024

I don't see a commit which changes the Tesseract code. Is something missing?

@woodjohndavid
Copy link
Author

OK thanks @stweil

This may leave you somewhat concerned about my code changes, but I can assure you that I am a competent developer. I just never use GitHub other than with Tesseract, hence incompetent in that regard.

Note by the way, similar to the other pull request I generated, there are some new configuration values that can only be set in code as it stands, but should be made into available settings. I have not yet figured out the mechanism for doing that. If the diplopia changes I am proposing turn out to be useful, hopefully someone else familiar with the settings approach could take care of that. These configuration values are:

  • bool kRemoveDiplopia - if true, enables diplopia removal functionality. If false, my changes have no effect
  • int kMaxDiplopiaGap - number of timesteps apart to be considered diplopia, default 2

@amitdo
Copy link
Collaborator

amitdo commented Mar 14, 2024

Apart from testing that this patch has a positive effect on the diplopia issue, people should test if there is no negative effect in other places, like dropping of correct characters.

src/lstm/recodebeam.cpp Show resolved Hide resolved
src/lstm/recodebeam.cpp Outdated Show resolved Hide resolved
@woodjohndavid
Copy link
Author

Please note that this change is likely not appropriate for those using Tesseract for natural language recognition using relevant dawgs. It is primarily intended for those (like myself) using Tesseract to scan technical data, looking for exact character by character recognition.

@Papucs
Copy link

Papucs commented Sep 12, 2024

Hi,
Is there any chance this will be merged in the near future?

@amitdo
Copy link
Collaborator

amitdo commented Sep 12, 2024

This PR adds 425 lines to the neural network's code.

According to the PR author, his solution to the 'diplopia' issue is not a generic solution. It tries to solve a specific use case.

So the question is, is it worth it? Do we want this in Tesseract?

Another question, is this Engilsh only solution? Will it work well with all other Latin based languages? What about other scripts like Cyrillic, Indic scripts, Hangul (Korrean), Chinese and Japanese scripts?

If we decide we do want this solution, the code still lacks a config variable. It should be boolean, set to 'False' by default.

@stweil
Copy link
Member

stweil commented Sep 12, 2024

We know that better models reduce the number of diplopia cases. That should work for all languages and scripts and is my preferred solution for the problem, instead of very special solutions in the software with unknown side effects.

@woodjohndavid
Copy link
Author

Hi all:

Just a few comments from myself, the author of this diplopia fix:

  1. As mentioned earlier, this change is intended primarily for those who are looking for a character-by-character OCR solution, not so much textual word recognition relying on the dawg functionality.

  2. In my testing on my limited set of data, it seems to work well, pretty much entirely eliminating diplopia. However, it needs much more testing on broader sample sets by others.

  3. I see no reason at all that this will not work for any other non-Latin character set. Of course this would need to be tested, but I don't believe that there is anything in the logic that is character set specific.

  4. Indeed it needs a user configurable variable to turn this function on and off. I have no experience with the config functionality, and it seems pretty arcane in the brief time I have spent looking at that code. Surely there are available developers with detailed knowledge of that code that can add this config variable easily. There is already a boolean on/off variable in the fix code, just no way for a user to set it.

For my own purposes, I don't need this fix incorporated into the main Tesseract code, as I am quite happy running my own custom copy. However, I have taken the time to promote it in this pull request to try to make a modest but hopefully meaningful contribution to Tesseract. If it is not going to see the light of day for others, that is unfortunate.

Regards,

Dave

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants