This repository contains the unified implementation of 1-Dimention and 2-Dimention seq2seq-attention network for OCR(simplified and complex chinese and english) with pytorch. Also can recognition horizonal and vertical texts with beam search and greedy search strategy.
At the same time, we trained the network with samples generated by the repository(https://github.com/yangsuhui/text_data_generator) which is also modified by me, this data generation repository can generate varied length、multi-lang、horizonal and vertical texts with random backgrounds.
pytorch 1.1.0
cd Attention_ocr_recognition
pip install -r requirements.txt
1-D mode
python train.py --trainlist ./data/ch_train.txt --vallist ./data/ch_test.txt --imgH 32 --imgW 280 --experiment ./expr/attention1dcnn --niter 40 --saveInterval 10 --mode 1D --use_beam_search
2-D mode
python train.py --trainlist ./data/train_v.txt --vallist ./data/test_v.txt --imgH 420 --imgW 420
1-D mode: A small dataset from Synthetic_Chinese_String_Dataset, about 270000+ images for training, 20000 images for testing. download the image data from Baidu with the 1-D datasets above, the model can achive 0.98 accuracy.
2-D mode datasets is generated using the repository (https://github.com/yangsuhui/text_data_generator) with the 2-D datasets about 10W training data and 5k test data, the model can achive 0.997 accuracy in vertical texts.
the train.txt and test.txt are created as the follow form:
path/to/image_name.jpg label
path/AttentionData/images/00000554.png 一份「設計構想」──用的就是
path/AttentionData/images/00001027.png 一排椅子上的那兩個中學生,他就
path/AttentionData/images/00000319.png 來,太激動了!多年來
The following command is used for 2-D mode, the 1-D mode also can used this command with same hypeparameters modified such as imgH、imgW etc.
python demo.py --encoder ./expr/attention2dcnn_v_res18/encoder_160.pth --decoder ./expr/attention2dcnn_v_res18/decoder_160.pth --imgH 420 --imgW 420 --img_path ./test_img/2dimg/00004747.png --use_beam_search
Download the 2-D model(Baidu, password: q8kp) and put the model in the foler of ./expr/attention2dcnn_v_res18/, and using the command above,the visulization of attention results gif will be saved in folder of vis.