PaddleOCR2Pytorch

English | 简体中文

Introduction

Converting PaddleOCR to PyTorch.

This repository aims to

learn PaddleOCR
use models in PyTorch which are trained in Paddle
give a guideline for Paddle2PyTorch

Notice

PytorchOCR models are converted from >= PaddleOCRv2.0.

Recent updates

2025.05.25 PP-OCRv5: High-Accuracy Text Recognition Model for All Scenarios - Instant Text from Images/PDFs.
1. 🌐 Single-model support for five text types - Seamlessly process Simplified Chinese, Traditional Chinese, Simplified Chinese Pinyin, English and Japanse within a single model.
2. ✍️ Improved handwriting recognition: Significantly better at complex cursive scripts and non-standard handwriting.
3. 🎯 13-point accuracy gain over PP-OCRv4, achieving state-of-the-art performance across a variety of real-world scenarios.
2024.02.20 PP-OCRv4, support mobile version and server version
- PP-OCRv4-mobile：When the speed is comparable, the effect of the Chinese scene is improved by 4.5% compared with PP-OCRv3, the English scene is improved by 10%, and the average recognition accuracy of the 80-language multilingual model is increased by more than 8%.
- PP-OCRv4-server：Release the OCR model with the highest accuracy at present, the detection model accuracy increased by 4.9% in the Chinese and English scenes, and the recognition model accuracy increased by 2%
2023.04.16 Handwritten Mathematical Expression Recognition CAN
2023.04.07 Image Super-Resolution Text Telescope
2022.10.17 Text Recognition: ViTSTR
2022.10.07 Text Detection: DB++
2022.07.24 text detection algorithms (FCENET)
2022.07.16 text recognition algorithms (SVTR)
2022.06.19 text recognition algorithms (SAR)
2022.05.29 PP-OCRv3: With comparable speed, the effect of Chinese scene is further improved by 5% compared with PP-OCRv2, the effect of English scene is improved by 11%, and the average recognition accuracy of 80 language multilingual models is improved by more than 5%
2022.05.14 PP-OCRv3 text detection model
2022.04.17 1text recognition algorithm (NRTR)
2022.03.20 1 text detection algorithm (PSENet)
2021.09.11 PP-OCRv2. The inference speed of PP-OCRv2 is 220% higher than that of PP-OCR server in CPU device. The F-score of PP-OCRv2 is 7% higher than that of PP-OCR mobile.
2021.06.01 update SRN
2021.04.25 update AAAI 2021 end-to-end algorithm PGNet
2021.04.24 update RARE
2021.04.12 update STARNET
2021.04.08 update DB, SAST, EAST, ROSETTA, CRNN
2021.04.03 update more than 25+ multilingual recognition models models list, including：English, Chinese, German, French, Japanese，Spanish，Portuguese Russia Arabic and so on. Models for more languages will continue to be updated Develop Plan.
2021.01.10 upload Chinese and English general OCR models.

Features

PTOCR series of high-quality pre-trained models, comparable to commercial effects
- Ultra lightweight PP-OCR series models: detection + direction classifier + recognition
- Ultra lightweight ptocr_mobile series models
- General ptocr_server series models
- Support Chinese, English, and digit recognition, vertical text recognition, and long text recognition
- Support multi-language recognition: Korean, Japanese, German, French, etc.

Model List (updating)

PyTorch models in BaiduPan：https://pan.baidu.com/s/1r1DELT8BlgxeOP2RqREJEg code：6clx

PaddleOCR models in BaiduPan：https://pan.baidu.com/s/1getAprT2l_JqwhjwML0g9g code：lmv7

If you want to get more models including multilingual models，please refer to PTOCR series.

Tutorials

Installation
Inferences
PP-OCR Pipeline
Visualization
Reference documents
FAQ
References

TODO

PP-OCRv5：Document Image Orientation Classification Module: PP-LCNet_x1_0_doc_ori，Text Image Rectification Module: UVDoc，Text Line Orientation Classification Module: PP-LCNet_x0_25_textline_ori
General Document-Parsing Solution PP-StructureV3: Delivers high-precision parsing of multi-layout, multi-scene PDFs, outperforming many open- and closed-source solutions on public benchmarks.
Intelligent Document-Understanding Solution PP-ChatOCRv4: Natively powered by the WenXin large model 4.5T, achieving 15 percentage points higher accuracy than its predecessor.
Add implementation of cutting-edge algorithms：Text Detection DRRG, Text Recognition RFL
Text Recognition: ABINet, VisionLAN, SPIN, RobustScanner
Table Recognition: TableMaster
PP-Structurev2，with functions and performance fully upgraded, adapted to Chinese scenes, and new support for Layout Recovery and one line command to convert PDF to Word
Layout Analysis optimization: model storage reduced by 95%, while speed increased by 11 times, and the average CPU time-cost is only 41ms
Table Recognition optimization: 3 optimization strategies are designed, and the model accuracy is improved by 6% under comparable time consumption
Key Information Extraction optimization：a visual-independent model structure is designed, the accuracy of semantic entity recognition is increased by 2.8%, and the accuracy of relation extraction is increased by 9.1%
text recognition algorithms (SEED)
key information extraction algorithm (SDMGR)
3 DocVQA algorithms (LayoutLM, LayoutLMv2, LayoutXLM)
a new structured documents analysis toolkit, i.e., PP-Structure, support layout analysis and table recognition (One-key to export chart images to Excel files).

PP-OCR Pipeline

[1] PP-OCR is a practical ultra-lightweight OCR system. It is mainly composed of three parts: DB text detection, detection frame correction and CRNN text recognition. The system adopts 19 effective strategies from 8 aspects including backbone network selection and adjustment, prediction head design, data augmentation, learning rate transformation strategy, regularization parameter selection, pre-training model use, and automatic model tailoring and quantization to optimize and slim down the models of each module (as shown in the green box above). The final results are an ultra-lightweight Chinese and English OCR model with an overall size of 3.5M and a 2.8M English digital OCR model. For more details, please refer to the PP-OCR technical article (https://arxiv.org/abs/2009.09941).

[2] On the basis of PP-OCR, PP-OCRv2 is further optimized in five aspects. The detection model adopts CML(Collaborative Mutual Learning) knowledge distillation strategy and CopyPaste data expansion strategy. The recognition model adopts LCNet lightweight backbone network, U-DML knowledge distillation strategy and enhanced CTC loss function improvement (as shown in the red box above), which further improves the inference speed and prediction effect. For more details, please refer to the technical report of PP-OCRv2.

Visualization

Chinese OCR model

English OCR model

Multilingual OCR model

References

https://github.com/PaddlePaddle/PaddleOCR
https://github.com/WenmuZhou/PytorchOCR
Paddle
Pytorch
https://github.com/frotms/image_classification_pytorch
https://github.com/PaddlePaddle/PaddleOCR/blob/release/2.7/doc/doc_en/models_list_en.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Files

README_en.md

README_en.md

PaddleOCR2Pytorch

Introduction

Notice

Features

Model List (updating)

Tutorials

TODO

PP-OCR Pipeline

Visualization

References

Collapse file tree

Files

README_en.md

Latest commit

History

README_en.md

File metadata and controls

PaddleOCR2Pytorch

Introduction

Notice

Features

Model List (updating)

Tutorials

TODO

PP-OCR Pipeline

Visualization

References