An OCR model for Vietnamese Handwriting Recognition problems with CNN + LSTM implemented with PyTorch Deeplearning framework.
This model is based on the proposed architecture in this paper: https://arxiv.org/pdf/1507.05717.pdf
- I used pretrained VGG16 for CNN's backbone, and Bidirectional LSTM for recurrent layersI highly recommend using conda virtual environment
pip install -r requirements.txt
This dataset is provided by Cinamon AI for Cinamon's AI Challenge.
In this step, we have to
- Binarized image by applying Otsu's thresholding method
- Remove noise
- Smooth boundaries by applying Contour Filter
I divided training process into 2 phases:
- Phase 1: Train LSTM only: 40 epochs, freezed VGG ,lr = 1e-3
python train.py --epoch [num of epochs] --img_path [path to img directory] --label_path [path to label directory] --lr [learning rate] --batch_size [batchsize] --ft [finetune: true or false] --mode [decode mode: 'greedy' or 'beam']
- Phase 2: Finetune VGG16 backbone: 30 epochs, unfreezed VGG, lr = 1e-4
python finetune.py --epoch [num of epochs] --img_path [path to img directory] --label_path [path to label directory] --lr [learning rate] --batch_size [batchsize] --ft [finetune: true or false] --mode [decode mode: 'greedy' or 'beam']
I used CTC as loss function. There are two strategies for decoding task, Greedy or BeamSearch decoder.