Are you tired using tesseract as your OCR engine? This repository provides a hackable PyTorch version of EfficientDet to detect characters, i.e. digits, letters and special symbols in numbers and words, within an image.
For CharacterDet different datasets consisting of 512x512px images with numbers, numbers with special characters and numbers, special characters and letters are given here. Each dataset is created using the files build_dataset.py and build_dataset_letters.py where the latter is a more complex and advanced version of the former.
The total number of training/validation images is controlled using the parameters total_ds_pics_train and total_ds_pics_train. The size of the background canvas is controlled by canvas_size=512 and can be set to a different value, if larger images are needed. Words and numbers are built using images of individual characters which are resized using img_size_x/y and each character used to create the dataset is from the popular font Avenir LT Std Semi-Light.
Please note that both the numbers and the letters datasets employ a horizontal and vertical shift of characters in their numbers/words but the character size is fixed in the former while it is variable in the latter, as shown below.
This project uses the standard anchor scales/ratios as well as RGB mean and std used for by EfficientDet for COCO given in the projects/*.yaml files. Obviously, these values do not reflect the datasets given here and a faster convergence or better loss could be found by optimizing these values. In my experiments (not shown in this repo) I did not achieve a better result playing with these values and for unclear reasons finding the anchor-ratios using kmeans-anchros-ratios failed to converge for me.
I use the COCO pretrained EfficientDet weights (pyTorch .pth) provided here and finetune on the datasets provided above. I use a two-step training process, where in the first step the EfficentNet backbone is frozen and only the head, i.e. the BiFPN layers as well as the Class/Box prediction net layers are trained. In the second step also the backbone layers are trainable. Provided below are the detailed training parameters for each CharacterDet downloadable from here. All other parameters are chosen according to the EfficientDet paper. Training is conducted on a single A100 40GB GPU.
Example
python train.py -c 0 -p numbers --head_only True --lr 5e-3 --batch_size 16 --load_weights weights/efficientdet-d0.pth --num_epochs 20 --save_interval 100
python train.py -c 0 -p numbers --head_only False --lr 1e-3 --batch_size 16 --load_weights last --num_epochs 35 --save_interval 100
Tensorboard training-logs are provided here. Please note that the snapshots provided above correspond to the respective step in these logs.
This repository is a finetune and adaptaion of the wonderful repository 'Yet another EfficientDet in PyTorch' repo.