Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parameters for inference #29

Open
indicator0 opened this issue Mar 4, 2024 · 2 comments
Open

Parameters for inference #29

indicator0 opened this issue Mar 4, 2024 · 2 comments

Comments

@indicator0
Copy link

Hi! I am using Layla for baseline detection as a part of Loghi. I've noticed that there are times when, despite being a whole line in the input image, the Laypa model recognizes this line as two separate sentences during inference because the distance between them is just slightly longer. This in turn leads to errors in the subsequent transcribing.

Is it possible to alter the config to enhance the inference results while using the original model weights? Thank you!

@stefanklut
Copy link
Owner

stefanklut commented Mar 4, 2024

Thank you for your interest,

Unfortunately I don't think that there is a lot you can do without finetuning to improve results. The one thing you could look at is the internal size used (INPUT.MIN_SIZE_TEST and INPUT.MAX_SIZE_TEST). However, this might negatively impact performance in another way, since it has not been trained on this size.

Do you have some more info on the type of image where this problem occurs? We have seen this type of behavior on really small images for example. Also worth noting that the opposite of your problem is also something that we are trying to prevent. That being text lines that are close together, but should be separated (e.g. newspapers).

@indicator0
Copy link
Author

Thanks for your quick reply!

Here I can provide two examples. The first pair of images in line "med de frisinnade" has an unwanted baseline break, the second pair of images after "§6" has several unwanted baseline break due to the large space between words. These should logically in one line, but the model breaks them.

Screenshot 2024-03-04 at 1 40 15 PM Screenshot 2024-03-04 at 1 56 42 PM Screenshot 2024-03-04 at 1 46 25 PM Screenshot 2024-03-04 at 1 56 35 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants