-
Notifications
You must be signed in to change notification settings - Fork 209
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
aeeaa44
commit 7ce6b87
Showing
3 changed files
with
166 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,163 @@ | ||
# 示例 | ||
## 识别 PDF 文件,返回其 Markdown 格式 | ||
|
||
对于 PDF 文件,可以使用函数 `.recognize_pdf()` 对整个文件或者指定页进行识别,并把结果输出为 Markdown 文件。如针对以下 PDF 文件 ([examples/test-doc.pdf](examples/test-doc.pdf)): | ||
|
||
调用方式如下: | ||
|
||
```python | ||
from pix2text import Pix2Text, merge_line_texts | ||
|
||
img_fp = './examples/test-doc.pdf' | ||
p2t = Pix2Text() | ||
doc = p2t.recognize_pdf(img_fp, page_numbers=[0, 1]) | ||
doc.to_markdown('output-md') # 导出的 Markdown 信息保存在 output-md 目录中 | ||
``` | ||
|
||
> 如果期望导出 Markdown 之外的其他格式,如 Word、HTML、PDF 等,推荐使用工具 [Pandoc](https://pandoc.org) 对 Markdown 结果进行转换即可。 | ||
## 识别既有公式又有文本的段落图片 | ||
|
||
对于既有公式又有文本的段落图片,识别时不需要使用版面分析模型。 | ||
可以使用函数 `.recognize_text_formula()` 识别图片中的文字和数学公式。如针对以下图片 ([examples/en1.jpg](examples/en1.jpg)): | ||
|
||
<div align="center"> | ||
<img src="./examples/en1.jpg" alt="English mixed image" width="600px"/> | ||
</div> | ||
|
||
调用方式如下: | ||
|
||
```python | ||
from pix2text import Pix2Text, merge_line_texts | ||
|
||
img_fp = './examples/en1.jpg' | ||
p2t = Pix2Text() | ||
outs = p2t.recognize_text_formula(img_fp, resized_shape=608, return_text=True) | ||
print(outs) | ||
``` | ||
|
||
返回结果 `outs` 是个 `dict`,其中 key `position` 表示Box位置信息,`type` 表示类别信息,而 `text` 表示识别的结果。具体说明见[接口说明](#接口说明)。 | ||
|
||
|
||
|
||
## 识别纯公式图片 | ||
|
||
对于只包含数学公式的图片,使用函数 `.recognize_formula()` 可以把数学公式识别为 LaTeX 表达式。如针对以下图片 ([examples/math-formula-42.png](examples/math-formula-42.png)): | ||
|
||
<div align="center"> | ||
<img src="./examples/math-formula-42.png" alt="Pure Math Formula image" width="300px"/> | ||
</div> | ||
|
||
|
||
调用方式如下: | ||
|
||
```python | ||
from pix2text import Pix2Text | ||
|
||
img_fp = './examples/math-formula-42.png' | ||
p2t = Pix2Text() | ||
outs = p2t.recognize_formula(img_fp) | ||
print(outs) | ||
``` | ||
|
||
返回结果为字符串,即对应的 LaTeX 表达式。具体说明见[说明](usage.md)。 | ||
|
||
## 识别纯文字图片 | ||
|
||
对于只包含文字不包含数学公式的图片,使用函数 `.recognize_text()` 可以识别出图片中的文字。此时 Pix2Text 相当于一般的文字 OCR 引擎。如针对以下图片 ([examples/general.jpg](examples/general.jpg)): | ||
|
||
<div align="center"> | ||
<img src="./examples/general.jpg" alt="Pure Math Formula image" width="400px"/> | ||
</div> | ||
|
||
|
||
调用方式如下: | ||
|
||
```python | ||
from pix2text import Pix2Text | ||
|
||
img_fp = './examples/general.jpg' | ||
p2t = Pix2Text() | ||
outs = p2t.recognize_text(img_fp) | ||
print(outs) | ||
``` | ||
|
||
返回结果为字符串,即对应的文字序列。具体说明见[接口说明](#接口说明)。 | ||
|
||
|
||
## 针对不同语言 | ||
|
||
### 英文 | ||
|
||
**识别效果**: | ||
|
||
 | ||
|
||
**识别命令**: | ||
|
||
```bash | ||
p2t predict -l en -a mfd -t yolov7 --analyzer-model-fp ~/.cnstd/1.2/analysis/mfd-yolov7-epoch224-20230613.pt --formula-ocr-config '{"model_name":"mfr-pro","model_backend":"onnx"}' --resized-shape 768 --save-analysis-res out_tmp.jpg --text-ocr-config '{"rec_model_name": "doc-densenet_lite_666-gru_large"}' --auto-line-break -i examples/en1.jpg | ||
``` | ||
|
||
> 注意 ⚠️ :上面命令使用了付费版模型,也可以如下使用免费版模型,只是效果略差: | ||
> | ||
> ```bash | ||
> p2t predict -l en -a mfd -t yolov7_tiny --resized-shape 768 --save-analysis-res out_tmp.jpg --auto-line-break -i examples/en1.jpg | ||
> ``` | ||
### 简体中文 | ||
**识别效果**: | ||
 | ||
**识别命令**: | ||
```bash | ||
p2t predict -l en,ch_sim -a mfd -t yolov7 --analyzer-model-fp ~/.cnstd/1.2/analysis/mfd-yolov7-epoch224-20230613.pt --formula-ocr-config '{"model_name":"mfr-pro","model_backend":"onnx"}' --resized-shape 768 --save-analysis-res out_tmp.jpg --text-ocr-config '{"rec_model_name": "doc-densenet_lite_666-gru_large"}' --auto-line-break -i examples/mixed.jpg | ||
``` | ||
> 注意 ⚠️ :上面命令使用了付费版模型,也可以如下使用免费版模型,只是效果略差: | ||
> | ||
> ```bash | ||
> p2t predict -l en,ch_sim -a mfd -t yolov7_tiny --resized-shape 768 --save-analysis-res out_tmp.jpg --auto-line-break -i examples/mixed.jpg | ||
> ``` | ||
### 繁体中文 | ||
**识别效果**: | ||
 | ||
**识别命令**: | ||
```bash | ||
p2t predict -l en,ch_tra -a mfd -t yolov7 --analyzer-model-fp ~/.cnstd/1.2/analysis/mfd-yolov7-epoch224-20230613.pt --formula-ocr-config '{"model_name":"mfr-pro","model_backend":"onnx"}' --resized-shape 768 --save-analysis-res out_tmp.jpg --auto-line-break -i examples/ch_tra.jpg | ||
``` | ||
> 注意 ⚠️ :上面命令使用了付费版模型,也可以如下使用免费版模型,只是效果略差: | ||
> | ||
> ```bash | ||
> p2t predict -l en,ch_tra -a mfd -t yolov7_tiny --resized-shape 768 --save-analysis-res out_tmp.jpg --auto-line-break -i examples/ch_tra.jpg | ||
> ``` | ||
### 越南语 | ||
**识别效果**: | ||
 | ||
**识别命令**: | ||
```bash | ||
p2t predict -l en,vi -a mfd -t yolov7 --analyzer-model-fp ~/.cnstd/1.2/analysis/mfd-yolov7-epoch224-20230613.pt --formula-ocr-config '{"model_name":"mfr-pro","model_backend":"onnx"}' --resized-shape 768 --save-analysis-res out_tmp.jpg --no-auto-line-break -i examples/vietnamese.jpg | ||
``` | ||
> 注意 ⚠️ :上面命令使用了付费版模型,也可以如下使用免费版模型,只是效果略差: | ||
> | ||
> ```bash | ||
> p2t predict -l en,vi -a mfd -t yolov7_tiny --resized-shape 768 --save-analysis-res out_tmp.jpg --no-auto-line-break -i examples/vietnamese.jpg | ||
> ``` | ||
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters