A simple tool to detect if there is a signature in an image or a PDF file.
It's the quick way to use this tool.
signature-detect
package contains the codes in the src
.
pip install signature-detect
It's the recommended way to explore this tool. It provides notebooks for playing around.
-
install anaconda
-
install dependencies
conda create --name <env> --file conda.txt
-
Image:
python demo.py --file my-image.jpeg
-
PDF File:
python demo.py --file my-file.pdf
All the code in src
is covered.
cd tests
coverage run -m unittest
coverage report -m
We use the following image as an example. The full example is in the demo notebook
The loader reads the file and creates a mask.
The mask is a numpy array. The bright parts are set to 255, the rest is set to 0. It contains ONLY these 2 numbers.
-
low_threshold = (0, 0, 250)
-
high_threshold = (255, 255, 255)
They control the creation of the mask, used in the function cv.inRange
.
Here, yellow is 255
and purple is 0
.
The extractor, first, generates the regions from the mask.
Then, it removes the small and the big regions because the signature is neither too big nor too small.
The process is as follows:
-
label the image
skimage.measure.label
labels the connected regions of an integer array. It returns a labeled array, where all connected regions are assigned the same integer value. -
calculate the average size of the regions
Here, the size means the number of pixels in a region.
We accumulate the number of pixels in all the regions,
total_pixels
. The average size istotal_pixels / nb_regions
.If the size of a region is smaller than
min_area_size
, this region is ignored.min_area_size
is given by the user. -
calculate the size of the small outlier
small_size_outlier = average * outlier_weight + outlier_bias
outlier_weight
andoutlier_bias
are given by the user. -
calculate the size of the big outlier
big_size_outlier = small_size_outlier * amplifier
amplifier
is given by the user. -
remove the small and big outliers
-
outlier_weight = 3
-
outlier_bias = 100
-
amplifier = 10
15
is used in the demo. -
min_area_size = 10
The cropper finds the contours of regions in the labeled masks and crops them.
Suppose (h, w) = region.shape
.
-
min_region_size = 10000
If
h * w < min_region_size
, then this region is ignored. -
border_ratio: float
border = min(h, w) * border_ratio
The border will be removed if this attribute is not
0
.
The judger reads the cropped mask and identifies if it's a signature or not.
Suppose (h, w) = cropped_mask.shape
.
-
size_ratio: [low, high]
low < max(h, w) / min(h, w) < high.
-
max_pixel_ratio: [low, high]
low < the number of 0s / the number of 255s < high.
The mask should only have 2 values, 0 and 255.
By default:
-
size_ratio = [1, 4]
-
max_pixel_ratio = [0.01, 1]
-
max(h, w) / min(h, w)
= 3.48 -
number of
0s
/ number of255s
= 0.44
So, this image is signed.