Skip to content

Commit d231edb

Browse files
committed
Merge branch 'master' into pr-144
2 parents ecfc989 + 24b7ced commit d231edb

File tree

14 files changed

+222
-242
lines changed

14 files changed

+222
-242
lines changed

.circleci/config.yml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,12 @@ jobs:
66

77
build-python36:
88
docker:
9-
- image: ubuntu:18.04
9+
- image: ocrd/core
1010
steps:
1111
- run: apt-get update && apt-get install -y --no-install-recommends make git curl
1212
- checkout
13-
- run: make deps-ubuntu deps-test deps install repo/assets
13+
- run: make deps-ubuntu
14+
- run: make install
1415
- run: make test-cli
1516
- run: make coverage
1617
- codecov/upload

.pylintrc

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ ignored-modules=cv2,tesserocr
66
disable =
77
ungrouped-imports,
88
bad-continuation,
9+
trailing-whitespace,
910
missing-docstring,
1011
no-self-use,
1112
superfluous-parens,
@@ -15,6 +16,7 @@ disable =
1516
too-many-branches,
1617
too-many-statements,
1718
too-many-locals,
19+
too-many-nested-blocks,
1820
too-few-public-methods,
1921
wrong-import-order,
2022
duplicate-code

CHANGELOG.md

Lines changed: 51 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,33 @@ Versioned according to [Semantic Versioning](http://semver.org/).
55

66
## Unreleased
77

8+
## [0.9.5] - 2020-10-02
9+
10+
Fixed:
11+
12+
* logging according to https://github.com/OCR-D/core/pull/599 (again)
13+
14+
## [0.9.4] - 2020-09-24
15+
16+
Fixed:
17+
18+
* recognize: be robust to different input image modes, Pillow#4925
19+
* logging according to https://github.com/OCR-D/core/pull/599
20+
21+
## [0.9.3] - 2020-09-15
22+
23+
Fixed:
24+
25+
* segmentation: ensure new elements fit into their parent coords
26+
* segmentation: ensure valid coords
27+
28+
## [0.9.2] - 2020-09-04
29+
30+
Fixed:
31+
32+
* segment-region: just ignore region outside of page frame, #145
33+
* deskew: add suffix to AlternativeImage file ID, #148
34+
835
## [0.9.1] - 2020-08-16
936

1037
Fixed:
@@ -204,25 +231,28 @@ Changed:
204231
* Recognition with proper support for textequiv_level, drop `page` level
205232

206233
<!-- link-labels -->
207-
[0.9.1]: v0.9.1...v0.9.0
208-
[0.9.0]: v0.9.0...v0.8.5
209-
[0.8.5]: v0.8.5...v0.8.4
210-
[0.8.4]: v0.8.4...v0.8.3
211-
[0.8.3]: v0.8.3...v0.8.2
212-
[0.8.2]: v0.8.2...v0.8.1
213-
[0.8.1]: v0.8.1...v0.8.0
214-
[0.8.0]: v0.8.0...v0.7.0
215-
[0.7.0]: v0.7.0...v0.6.0
216-
[0.6.0]: v0.6.0...v0.5.1
217-
[0.5.1]: v0.5.1...v0.5.0
218-
[0.5.0]: v0.5.0...v0.4.1
219-
[0.4.1]: v0.4.1...v0.4.0
220-
[0.4.0]: v0.4.0...v0.3.0
221-
[0.3.0]: v0.3.0...v0.2.2
222-
[0.2.2]: v0.2.2...v0.2.1
223-
[0.2.1]: v0.2.1...v0.2.0
224-
[0.2.0]: v0.2.0...v0.1.2
225-
[0.1.3]: v0.1.3...v0.1.2
226-
[0.1.2]: v0.1.2...v0.1.1
227-
[0.1.1]: v0.1.1...v0.1.0
234+
[0.9.4]: ../../compare/v0.9.3...v0.9.4
235+
[0.9.3]: ../../compare/v0.9.2...v0.9.3
236+
[0.9.2]: ../../compare/v0.9.1...v0.9.2
237+
[0.9.1]: ../../compare/v0.9.0...v0.9.1
238+
[0.9.0]: ../../compare/v0.8.5...v0.9.0
239+
[0.8.5]: ../../compare/v0.8.4...v0.8.5
240+
[0.8.4]: ../../compare/v0.8.3...v0.8.4
241+
[0.8.3]: ../../compare/v0.8.2...v0.8.3
242+
[0.8.2]: ../../compare/v0.8.1...v0.8.2
243+
[0.8.1]: ../../compare/v0.8.0...v0.8.1
244+
[0.8.0]: ../../compare/v0.7.0...v0.8.0
245+
[0.7.0]: ../../compare/v0.6.0...v0.7.0
246+
[0.6.0]: ../../compare/v0.5.1...v0.6.0
247+
[0.5.1]: ../../compare/v0.5.0...v0.5.1
248+
[0.5.0]: ../../compare/v0.4.1...v0.5.0
249+
[0.4.1]: ../../compare/v0.4.0...v0.4.1
250+
[0.4.0]: ../../compare/v0.3.0...v0.4.0
251+
[0.3.0]: ../../compare/v0.2.2...v0.3.0
252+
[0.2.2]: ../../compare/v0.2.1...v0.2.2
253+
[0.2.1]: ../../compare/v0.2.0...v0.2.1
254+
[0.2.0]: ../../compare/v0.1.2...v0.2.0
255+
[0.1.3]: ../../compare/v0.1.2...v0.1.3
256+
[0.1.2]: ../../compare/v0.1.1...v0.1.2
257+
[0.1.1]: ../../compare/v0.1.0...v0.1.1
228258
[0.1.0]: ../../compare/HEAD...v0.1.0

Makefile

Lines changed: 30 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -25,14 +25,15 @@ help:
2525
@echo " from Alexander Pozdnyakov which provides 4.1.0."
2626
@echo " See https://launchpad.net/~alex-p/+archive/ubuntu/tesseract-ocr"
2727
@echo " for details.)"
28-
@echo " deps Install python deps via pip"
29-
@echo " deps-test Install testing python deps via pip"
30-
@echo " install Install"
28+
@echo " deps Install Python deps for install via pip"
29+
@echo " deps-test Install Python deps for test via pip"
3130
@echo " docker Build docker image"
32-
@echo " test Run test"
31+
@echo " install Install this package"
32+
@echo " test Run unit tests"
33+
@echo " coverage Run unit tests and determine test coverage"
3334
@echo " test-cli Test the command line tools"
34-
@echo " repo/assets Clone OCR-D/assets to ./repo/assets"
3535
@echo " test/assets Setup test assets"
36+
@echo " repo/assets Clone OCR-D/assets to ./repo/assets"
3637
@echo " assets-clean Remove symlinks in test/assets"
3738
@echo ""
3839
@echo " Variables"
@@ -44,7 +45,7 @@ help:
4445

4546
# Dependencies for deployment in an ubuntu/debian linux
4647
# (lib*-dev merely for building tesserocr with pip)
47-
# (tesseract-ocr: Ubuntu 18.04 now ships 4.0.0
48+
# (tesseract-ocr: Ubuntu 18.04 now ships 4.0.0,
4849
# which is unsupported. Add the tesseract-ocr PPA
4950
# from Alexander Pozdnyakov which provides 4.1.0.
5051
# See https://launchpad.net/~alex-p/+archive/ubuntu/tesseract-ocr
@@ -62,32 +63,32 @@ deps-ubuntu:
6263
tesseract-ocr-eng \
6364
tesseract-ocr
6465

65-
# Install python deps via pip
66+
# Install Python deps for install via pip
6667
deps:
6768
$(PIP) install -U pip
6869
$(PIP) install -r requirements.txt
6970

70-
# Install testing python deps via pip
71+
# Install Python deps for test via pip
7172
deps-test:
7273
$(PIP) install -U pip
7374
$(PIP) install -r requirements_test.txt
7475

75-
# Install
76-
install:
77-
$(PIP) install -U pip
78-
$(PIP) install .
79-
8076
# Build docker image
8177
docker:
8278
docker build -t $(DOCKER_TAG) .
8379

80+
# Install this package
81+
install: deps
82+
$(PIP) install -U pip
83+
$(PIP) install .
84+
8485
# Run unit tests
85-
test: test/assets
86+
test: test/assets deps-test
8687
# declare -p HTTP_PROXY
8788
$(PYTHON) -m pytest --continue-on-collection-errors test $(PYTEST_ARGS)
8889

8990
# Run unit tests and determine test coverage
90-
coverage:
91+
coverage: deps-test
9192
coverage erase
9293
make test PYTHON="coverage run"
9394
coverage report
@@ -96,30 +97,33 @@ coverage:
9697
# Test the command line tools
9798
test-cli: test/assets
9899
$(PIP) install -e .
99-
rm -rfv test-workspace
100-
cp -rv test/assets/kant_aufklaerung_1784 test-workspace
101-
export LC_ALL=C.UTF-8; cd test-workspace/data && \
102-
ocrd-tesserocr-segment-region -l DEBUG -m mets.xml -I OCR-D-IMG -O OCR-D-SEG-BLOCK ; \
103-
ocrd-tesserocr-segment-line -l DEBUG -m mets.xml -I OCR-D-SEG-BLOCK -O OCR-D-SEG-LINE ; \
104-
ocrd-tesserocr-recognize -l DEBUG -m mets.xml -I OCR-D-SEG-LINE -O OCR-D-TESS-OCR
100+
rm -rfv test/workspace
101+
cp -rv test/assets/kant_aufklaerung_1784 test/workspace
102+
cd test/workspace/data && \
103+
ocrd-tesserocr-segment-region -l DEBUG -I OCR-D-IMG -O OCR-D-SEG-REGION ; \
104+
ocrd-tesserocr-segment-line -l DEBUG -I OCR-D-SEG-REGION -O OCR-D-SEG-LINE ; \
105+
ocrd-tesserocr-recognize -l DEBUG -I OCR-D-SEG-LINE -O OCR-D-TESS-OCR
105106

106107
.PHONY: test test-cli install deps deps-ubuntu deps-test help
107108

108109
#
109110
# Assets
110111
#
111112

113+
# Setup test assets (copy repo/assets)
114+
# FIXME remove/update if already present
115+
test/assets: repo/assets
116+
mkdir -p $@
117+
cp -r -t $@ repo/assets/data/*
118+
112119
# Clone OCR-D/assets to ./repo/assets
120+
# FIXME does not work if already checked out
121+
# FIXME should be a proper (VCed) submodule
113122
repo/assets:
114123
mkdir -p $(dir $@)
115124
git clone https://github.com/OCR-D/assets "$@"
116125

117126

118-
# Setup test assets
119-
test/assets: repo/assets
120-
mkdir -p $@
121-
cp -r -t $@ repo/assets/data/*
122-
123127
.PHONY: assets-clean
124128
# Remove symlinks in test/assets
125129
assets-clean:

ocrd_tesserocr/binarize.py

Lines changed: 3 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -8,15 +8,12 @@
88

99
from ocrd_utils import (
1010
getLogger,
11-
concat_padded,
1211
assert_file_grp_cardinality,
1312
make_file_id,
1413
MIMETYPE_PAGE
1514
)
1615
from ocrd_modelfactory import page_from_file
1716
from ocrd_models.ocrd_page import (
18-
MetadataItemType,
19-
LabelsType, LabelType,
2017
AlternativeImageType,
2118
TextRegionType,
2219
to_xml
@@ -26,7 +23,6 @@
2623
from .config import TESSDATA_PREFIX, OCRD_TOOL
2724

2825
TOOL = 'ocrd-tesserocr-binarize'
29-
LOG = getLogger('processor.TesserocrBinarize')
3026

3127
class TesserocrBinarize(Processor):
3228

@@ -51,6 +47,7 @@ def process(self):
5147
5248
Produce a new output file by serialising the resulting hierarchy.
5349
"""
50+
LOG = getLogger('processor.TesserocrBinarize')
5451
assert_file_grp_cardinality(self.input_file_grp, 1)
5552
assert_file_grp_cardinality(self.output_file_grp, 1)
5653

@@ -62,21 +59,9 @@ def process(self):
6259
page_id = input_file.pageId or input_file.ID
6360
LOG.info("INPUT FILE %i / %s", n, page_id)
6461
pcgts = page_from_file(self.workspace.download_file(input_file))
62+
self.add_metadata(pcgts)
6563
page = pcgts.get_Page()
6664

67-
# add metadata about this operation and its runtime parameters:
68-
metadata = pcgts.get_Metadata() # ensured by from_file()
69-
metadata.add_MetadataItem(
70-
MetadataItemType(type_="processingStep",
71-
name=self.ocrd_tool['steps'][0],
72-
value=TOOL,
73-
Labels=[LabelsType(
74-
externalModel="ocrd-tool",
75-
externalId="parameters",
76-
Label=[LabelType(type_=name,
77-
value=self.parameter[name])
78-
for name in self.parameter.keys()])]))
79-
8065
page_image, page_xywh, _ = self.workspace.image_from_page(
8166
page, page_id)
8267
LOG.info("Binarizing on '%s' level in page '%s'", oplevel, page_id)
@@ -117,6 +102,7 @@ def process(self):
117102
content=to_xml(pcgts))
118103

119104
def _process_segment(self, tessapi, ril, segment, image, xywh, where, page_id, file_id):
105+
LOG = getLogger('processor.TesserocrBinarize')
120106
tessapi.SetImage(image)
121107
image_bin = None
122108
layout = tessapi.AnalyseLayout()

ocrd_tesserocr/crop.py

Lines changed: 3 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33

44
import tesserocr
55
from ocrd_utils import (
6-
getLogger, concat_padded,
6+
getLogger,
77
crop_image,
88
coordinates_for_segment,
99
coordinates_of_segment,
@@ -18,8 +18,6 @@
1818
)
1919
from ocrd_modelfactory import page_from_file
2020
from ocrd_models.ocrd_page import (
21-
MetadataItemType,
22-
LabelsType, LabelType,
2321
CoordsType, AlternativeImageType,
2422
to_xml
2523
)
@@ -30,7 +28,6 @@
3028
from .segment_region import polygon_for_parent
3129

3230
TOOL = 'ocrd-tesserocr-crop'
33-
LOG = getLogger('processor.TesserocrCrop')
3431

3532
class TesserocrCrop(Processor):
3633

@@ -56,6 +53,7 @@ def process(self):
5653
5754
Produce new output files by serialising the resulting hierarchy.
5855
"""
56+
LOG = getLogger('processor.TesserocrCrop')
5957
assert_file_grp_cardinality(self.input_file_grp, 1)
6058
assert_file_grp_cardinality(self.output_file_grp, 1)
6159

@@ -70,21 +68,9 @@ def process(self):
7068
page_id = input_file.pageId or input_file.ID
7169
LOG.info("INPUT FILE %i / %s", n, page_id)
7270
pcgts = page_from_file(self.workspace.download_file(input_file))
71+
self.add_metadata(pcgts)
7372
page = pcgts.get_Page()
7473

75-
# add metadata about this operation and its runtime parameters:
76-
metadata = pcgts.get_Metadata() # ensured by from_file()
77-
metadata.add_MetadataItem(
78-
MetadataItemType(type_="processingStep",
79-
name=self.ocrd_tool['steps'][0],
80-
value=TOOL,
81-
Labels=[LabelsType(
82-
externalModel="ocrd-tool",
83-
externalId="parameters",
84-
Label=[LabelType(type_=name,
85-
value=self.parameter[name])
86-
for name in self.parameter.keys()])]))
87-
8874
# warn of existing Border:
8975
border = page.get_Border()
9076
if border:

0 commit comments

Comments
 (0)