PyraDox is a simple tool which helps in document digitization by extracting text information and masking of personal information with the help of Tesseract-ocr.
- Aadhaar Card is a 12-digit unique identity number that can be obtained voluntarily by residents or passport holders of India, based on their biometric and demographic data. The data is collected by the Unique Identification Authority of India (UIDAI), a statutory authority established in January 2009 by the government of India.
This tools need tesseract-ocr engine. Help yourself with this --
Install tesseract using windows installer available at :
Tesseract is available directly from many Linux distributions. The package is generally called 'tesseract' or 'tesseract-ocr' - search your distribution's repositories to find it. Thus you can install Tesseract 4.x and it's developer tools on Ubuntu 18.x bionic by simply running:
sudo apt install tesseract-ocr
sudo apt install libtesseract-dev
Refer here for more on installation on all other systems.
To install Tesseract run this command:
brew install tesseract
Use the package manager pip to install requirements.
pip install -r requirements.txt
Having hard time with pyt
Add path if pytesseract is unable to find Tesseract-ocr path. stackoverflow
pytesseract.pytesseract.tesseract_cmd = r'C:\Users\USER\AppData\Local\Tesseract-OCR\tesseract.exe'
from Aadhaar import Aadhaar_Card
config = {'orient' : True, #corrects orientation of image default -> True
'skew' : True, #corrects skewness of image default -> True
'crop': True, #crops document out of image default -> True
'contrast' : True, #Bnw for Better OCR default -> True
'psm': [3,4,6], #Google Tesseract psm modes default -> 3,4,6
'mask_color': (0, 165, 255), #Masking color BGR Format
'brut_psm': [6] #Keep only one for brut mask (6) is good to start
}
obj = Aadhaar_Card(config)
obj.validate("397788000234") #Binary Output 1|0
aadhaar_list = obj.extract("path of input image") #supported types (png, jpeg, jpg)
flag = obj.mask_image("path of input image", "path of output image", aadhaar_list) #supported types (png, jpeg, jpg)
obj.mask_nums("path of input image", "path of output image") #supported types (png, jpeg, jpg)
Find Usefull Examples of Request - Response api_samples
defaults_url = http://localhost:9001
headers = {'content-type': 'application/json'}
python app.py
request_json = {"test_number": 397788000234}
response_json = {'validity': 0 } #0|1 -> invalid|valid
request_json = {"doc_b64": base64_encoded_string}
response_json = {'aadhaar_list':['397788000234']} #enpty list if unable to find
request_json = {"doc_b64": base64_encoded_string, 'aadhaar': ['397788000234']}
response_json = {'doc_b64_masked':base64_encoded_string, 'is_masked': True} #if is_masked False then doc_b64_masked is None
D. Brut Mask any Readable Number from Aadhaar (works well on low res, bad quality images). url = '/api/brut_mask'
request_json = {"doc_b64": base64_encoded_string}
response_json = {'doc_b64_brut_masked': base64_encoded_string, 'mask_status': 'Done'}
Usecase : Take an aadhaar card, extract its aadhaar number while checking number's validty, mask first 8 digits. If aadhaar number is not readable then mask possible numbers (brut mode) .
request_json = {"doc_b64": base64_encoded_string, "brut" : True}
response_json = {'doc_b64_masked':base64_encoded_string, 'is_masked': True,'mode_executed' : "OCR-MASKING", 'aadhaar_list':"All Possible Aadhar Numbers of 12 digits", 'valid_aadhaar_list':['Valid Aadhar Numbers Only']}
docker build -t pyradox .
docker run -p 9001:9001 pyradox
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
- Finish Dockerfile
- Add Badges
- Add Class Preprocessing
- Sample Website
- Push Docker image to hub
- Add Regex to extract Name, DOB, Gender.
Please make sure to update tests as appropriate.
while working on this project, I came across some good repos on github π which I am listing below.
Aadhar Number Validator and Generator Aadhaar-Card-OCR
If there is anything totally unclear, or not working, please feel free to file an issue. reach out at Email π
If this project was helpful for you please show some love β