Skip to content

Files

Latest commit

 

History

History
136 lines (103 loc) · 11.4 KB

README_en.md

File metadata and controls

136 lines (103 loc) · 11.4 KB

English | 简体中文

Introduction

Converting PaddleOCR to PyTorch.

This repository aims to

  • learn PaddleOCR
  • use models in PyTorch which are trained in Paddle
  • give a guideline for Paddle2PyTorch

Notice

PytorchOCR models are converted from >= PaddleOCRv2.0.

Recent updates

  • 2025.05.25 PP-OCRv5: High-Accuracy Text Recognition Model for All Scenarios - Instant Text from Images/PDFs.
    1. 🌐 Single-model support for five text types - Seamlessly process Simplified Chinese, Traditional Chinese, Simplified Chinese Pinyin, English and Japanse within a single model.
    2. ✍️ Improved handwriting recognition: Significantly better at complex cursive scripts and non-standard handwriting.
    3. 🎯 13-point accuracy gain over PP-OCRv4, achieving state-of-the-art performance across a variety of real-world scenarios.
  • 2024.02.20 PP-OCRv4, support mobile version and server version
    • PP-OCRv4-mobile:When the speed is comparable, the effect of the Chinese scene is improved by 4.5% compared with PP-OCRv3, the English scene is improved by 10%, and the average recognition accuracy of the 80-language multilingual model is increased by more than 8%.
    • PP-OCRv4-server:Release the OCR model with the highest accuracy at present, the detection model accuracy increased by 4.9% in the Chinese and English scenes, and the recognition model accuracy increased by 2%
  • 2023.04.16 Handwritten Mathematical Expression Recognition CAN
  • 2023.04.07 Image Super-Resolution Text Telescope
  • 2022.10.17 Text Recognition: ViTSTR
  • 2022.10.07 Text Detection: DB++
  • 2022.07.24 text detection algorithms (FCENET)
  • 2022.07.16 text recognition algorithms (SVTR)
  • 2022.06.19 text recognition algorithms (SAR)
  • 2022.05.29 PP-OCRv3: With comparable speed, the effect of Chinese scene is further improved by 5% compared with PP-OCRv2, the effect of English scene is improved by 11%, and the average recognition accuracy of 80 language multilingual models is improved by more than 5%
  • 2022.05.14 PP-OCRv3 text detection model
  • 2022.04.17 1text recognition algorithm (NRTR)
  • 2022.03.20 1 text detection algorithm (PSENet)
  • 2021.09.11 PP-OCRv2. The inference speed of PP-OCRv2 is 220% higher than that of PP-OCR server in CPU device. The F-score of PP-OCRv2 is 7% higher than that of PP-OCR mobile.
  • 2021.06.01 update SRN
  • 2021.04.25 update AAAI 2021 end-to-end algorithm PGNet
  • 2021.04.24 update RARE
  • 2021.04.12 update STARNET
  • 2021.04.08 update DB, SAST, EAST, ROSETTA, CRNN
  • 2021.04.03 update more than 25+ multilingual recognition models models list, including:English, Chinese, German, French, Japanese,Spanish,Portuguese Russia Arabic and so on. Models for more languages will continue to be updated Develop Plan.
  • 2021.01.10 upload Chinese and English general OCR models.

Features

  • PTOCR series of high-quality pre-trained models, comparable to commercial effects
    • Ultra lightweight PP-OCR series models: detection + direction classifier + recognition
    • Ultra lightweight ptocr_mobile series models
    • General ptocr_server series models
    • Support Chinese, English, and digit recognition, vertical text recognition, and long text recognition
    • Support multi-language recognition: Korean, Japanese, German, French, etc.

Model List (updating)

PyTorch models in BaiduPan:https://pan.baidu.com/s/1r1DELT8BlgxeOP2RqREJEg code:6clx

PaddleOCR models in BaiduPan:https://pan.baidu.com/s/1getAprT2l_JqwhjwML0g9g code:lmv7

If you want to get more models including multilingual models,please refer to PTOCR series.

Tutorials

TODO

PP-OCR Pipeline

[1] PP-OCR is a practical ultra-lightweight OCR system. It is mainly composed of three parts: DB text detection, detection frame correction and CRNN text recognition. The system adopts 19 effective strategies from 8 aspects including backbone network selection and adjustment, prediction head design, data augmentation, learning rate transformation strategy, regularization parameter selection, pre-training model use, and automatic model tailoring and quantization to optimize and slim down the models of each module (as shown in the green box above). The final results are an ultra-lightweight Chinese and English OCR model with an overall size of 3.5M and a 2.8M English digital OCR model. For more details, please refer to the PP-OCR technical article (https://arxiv.org/abs/2009.09941).

[2] On the basis of PP-OCR, PP-OCRv2 is further optimized in five aspects. The detection model adopts CML(Collaborative Mutual Learning) knowledge distillation strategy and CopyPaste data expansion strategy. The recognition model adopts LCNet lightweight backbone network, U-DML knowledge distillation strategy and enhanced CTC loss function improvement (as shown in the red box above), which further improves the inference speed and prediction effect. For more details, please refer to the technical report of PP-OCRv2.

Visualization

  • Chinese OCR model
  • English OCR model
  • Multilingual OCR model

References