Skip to content

Automatic information extraction from identity card with ocr

Notifications You must be signed in to change notification settings

ING-2/Tc_ID_Card_OCR

 
 

Repository files navigation

Usage

Arguments

  • --folder_name: folder path
  • --neighbor_box_distance: Nearest box distance
  • --face_recognition: Face recognition method (dlib, ssd, haar)
  • --rotation_interval: Id card rotation interval in degrees
  • --ocr_method: ocr method (EasyOcr and TesseractOcr)

In Dlib and Haar face detection model, it is better to choose a rotation angle of less than 30 degrees, otherwise no face may be detected due to image inversion. Create a folder and put the ID card images in that folder

git clone git@github.com:musimab/Tc_ID_Card_OCR.git
mkdir images
python3 main.py --folder_name "images" --neighbor_box_distance 60 --face_recognition ssd --ocr_method EasyOcr --rotation_interval 60
pip install opencv-python-headless==4.5.3.56
pip install craft-text-detector
pip install easyocr

The result image and cropped regions will be saved to ./outputs by default. The json data will be saved to ./test by default.

TODOs

  1. deep learning based (Yolo SSD Faster Rcnn) identity card recognition model will be developed

Algorithm Pipeline

ocr_pip_update1

Input image

ori14_m2rot

Warped image

warped_img

CRAFT Character Density Map

txt_heat_map

Unet Output for character density map

maskem

Craft Output(red boxes) and Matched Boxes(blue boxes)

final_imgp

Ocr Output

Tc : 12345678909 Surname : MUSTAFA ALİ Name : YILMAZ DateofBirth : 07071999

Ocr Evaluation

The accuracy of the optical character system was evaluated according to 2 different criteria. The first of these is accuracy at the word level and the other is accuracy at the character level.

The evaluate.py function retrieves the predicted and actual values in json format

Character Level Comparision

  1. tc: 1303 / 1327 => 98.19 %
  2. surname: 805 / 816 => 98.65 %
  3. name: 742 / 746 => 99.46 %
  4. dateofbirth: 976 / 976 => 100.0 %

Word Level Comparision

  1. tc : 0.96 %
  2. surname : 0.91 %
  3. name : 0.95 %
  4. date: 1.0 %

Easy Ocr

https://github.com/sarra831/EasyOCR

About

Automatic information extraction from identity card with ocr

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%