This is an OCR program designed for travel document. It can now support 23 types of documents with pre-defined template. You can add whatever you like.
- Passport
- China ID card
- HK ID card (new format)
- HK ID card (old format)
- Macau ID card (new format)
- Macau ID card (old format)
- Macau ID card - backside with MRZ
- China to HK/Macau Entry Permit card
- China to HK/Macau Entry Permit (Old)
- China to Taiwan Entry Permit card
- HK/Macau to China Entry Permit card
- HK/Macau to China Entry Permit card (Old)
- Taiwan to China Entry Permit card
- Taiwan to China Entry Permit (Old)
- Australia Driver Licence - New South Wales
- Australia Driver Licence - Victoria
- Australia Driver Licence - Capital Territory
- Australia Driver Licence - Queensland
- Australia Driver Licence - Western
- Australia Driver Licence - Northern Territory
- Australia Driver Licence - Tasmania
- Australia Driver Licence - South Australia
- New Zealand Driver Licence
- CentOS / Windows
- python 3.7+
git clone --recursive https://github.com/wisebobo/doc_ocr_by_template
cd doc_ocr_by_template
pip3 install -r requirements.txt
Go to project folder, edit the settings.py to update those APP_ID/APP_KEY to your own one.
Then execute
./startServer.sh
or
python3 startServer.py
- Running tornado for exposing API service
- After receiving base64 image, pass to a pre-trained ResNet50 model for image classification to retrieve the document type.
- After getting the document type, create multiple threads to call Tencent/Baidu/Face++/Netease/JD OCR API to retrieve the 1st round of OCR result
- Base on the 1st round of OCR result, to match against the pre-defined template. Template is created by using the [project folder]/templates/template_generator.html. If template match, crop the recognition area to a new image (idea is to remove those unnecessary information to get a more accurate OCR result), then pass to Tencent/Baidu/Face++/Netease/JD OCR API again.
- Match the 2nd OCR result against the template fields
- According to corresponding document type to apply respective data cleasing logic
- Calculate the score