Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port to ocrd core version 3.0.0 #5

Open
wants to merge 102 commits into
base: fix-alpha-shape
Choose a base branch
from

Conversation

MehmedGIT
Copy link

@MehmedGIT MehmedGIT commented Aug 13, 2024

Already migrated processors:

  • OcropyBinarize
  • OcropyClip
  • OcropyDenoise
  • OcropyDeskew
  • OcropyDewarp
  • OcropyRecognize
  • OcropyResegment
  • OcropySegment
  • OcropyTrain
  • PostCorrector
  • CISAligner

ocrd_cis/ocropy/binarize.py Outdated Show resolved Hide resolved
ocrd_cis/ocropy/binarize.py Outdated Show resolved Hide resolved
ocrd_cis/ocropy/binarize.py Outdated Show resolved Hide resolved
ocrd_cis/ocropy/binarize.py Outdated Show resolved Hide resolved
ocrd_cis/ocropy/binarize.py Outdated Show resolved Hide resolved
if level == 'page':
try:
ret.append(self.process_page(page, page_image, page_xywh, zoom, page_id, output_file_id))
except ValueError as e:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bertsky Do we even want to catch these or should we let them explode and let core do the error handling? For page-wide binarization, I think we probably should because that means the page failed. But for region and line, we might have rogue instances of zero size but all the other regions/lines might be fine.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will soon catch all exceptions on the page level in core. So this should not be handled here.

Regarding lower-level error handling: We have discussed this before, but partial failures across a page in general mean we also must be able to cope with partial annotation (incremental processors). We have no real solution ATM.

But since this PR just preserves the current behaviour (skipping partial failures regardless of level), and there are also other possible causes to catch in core, let's keep it like this.

ocrd_cis/ocropy/binarize.py Outdated Show resolved Hide resolved
Copy link

@kba kba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ocrd-cis-ocropy-binarize with new API LGTM!

MehmedGIT and others added 8 commits August 14, 2024 10:51
Co-authored-by: Konstantin Baierer <kba@users.noreply.github.com>
Co-authored-by: Konstantin Baierer <kba@users.noreply.github.com>
Co-authored-by: Konstantin Baierer <kba@users.noreply.github.com>
Co-authored-by: Konstantin Baierer <kba@users.noreply.github.com>
binarize: don't conflate region/lines seg, pass output_file_id
.github/workflow/tests.yml Outdated Show resolved Hide resolved
.github/workflow/tests.yml Outdated Show resolved Hide resolved
MehmedGIT and others added 2 commits August 27, 2024 15:30
Co-authored-by: Robert Sachunsky <38561704+bertsky@users.noreply.github.com>
Co-authored-by: Robert Sachunsky <38561704+bertsky@users.noreply.github.com>
@bertsky bertsky self-requested a review August 27, 2024 13:36
@MehmedGIT
Copy link
Author

@bertsky, I think the GitHub Actions workflow is not triggered until you have that in your fork's master/main?

@MehmedGIT MehmedGIT marked this pull request as ready for review August 27, 2024 13:53
@bertsky
Copy link
Owner

bertsky commented Aug 27, 2024

I think the GitHub Actions workflow is not triggered until you have that in your fork's master/main?

No, AFAIK GH Actions must be activated for each GH repo/fork individually. So in this case, it would be your fork. Then once I merged here, I have to enable on my fork. And once fix-alpha-shape gets merged upstream, GHA would need to be activated there.

@bertsky
Copy link
Owner

bertsky commented Aug 27, 2024

Wow. So with that we now know that we will get more problems on Ocrolib starting with Python 3.9:

NameError: name 'NaN' is not defined

But at least 3.8 runs through (and fast!)

@MehmedGIT
Copy link
Author

MehmedGIT commented Aug 27, 2024

No, AFAIK GH Actions must be activated for each GH repo/fork individually. So in this case, it would be your fork. Then once I merged here, I have to enable on my fork. And once fix-alpha-shape gets merged upstream, GHA would need to be activated there.

It was probably me who messed up a bit and created the workflow folder instead of workflows ... Maybe it was just enough to rename the folder instead of merging branches to the master. To trigger GH Actions on this PR.

@bertsky
Copy link
Owner

bertsky commented Aug 27, 2024

Ok, so what do we do next? Debugging CircleCI seems tiresome, perhaps we should just deactivate that (keeping the config file). But then we should also add a CD to GHA. (The credentials for ocrd on Dockerhub must be added as a Secret in project settings IIRC.)

@MehmedGIT
Copy link
Author

MehmedGIT commented Aug 27, 2024

Wow. So with that we now know that we will get more problems on Ocrolib starting with Python 3.9

I will see if I can find a fast fix for that. But I will have to modify the ocrolib slightly to make NaN work with higher Python versions.

EDIT: All tests pass now after 224e86f and a397531. Not sure if a397531 was needed since tests pass regardless.

@MehmedGIT
Copy link
Author

But then we should also add a CD to GHA. (The credentials for ocrd on Dockerhub must be added as a Secret in project settings IIRC.)

Yes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants