关于5.2更新中英文公式识别能力的疑惑 #48

ManiiXu · 2024-05-06T04:03:34Z

ManiiXu
May 6, 2024

想问下[2024-05-02] Support mixed Chinese English formula recognition.具体是指什么样的功能或识别能力。测试如果是公式里带有中文字符并不能正常识别出来，看HF模型的tokenizer_config.json、vocab.jsont也没有更新中文的词。

Answered by OleehyO

May 6, 2024

具体来说就是可以把中英文文字和公式混在一起识别（需要在web demo的侧栏里打开混合识别的选项），例如这张图：

混合识别会进行文字OCR+公式OCR，所以公式识别模型的tokenizer并不需要有中文的token。

View full answer

OleehyO · 2024-05-06T04:52:56Z

OleehyO
May 6, 2024
Maintainer

具体来说就是可以把中英文文字和公式混在一起识别（需要在web demo的侧栏里打开混合识别的选项），例如这张图：

混合识别会进行文字OCR+公式OCR，所以公式识别模型的tokenizer并不需要有中文的token。

1 reply

OleehyO Jun 7, 2024
Maintainer

为了防止概念混淆，新版本已经把中英文公式识别更名为段落识别

OleehyO · 2024-05-06T04:54:19Z

OleehyO
May 6, 2024
Maintainer

使用混合识别后推理速度会很慢，而且目前的版本效果还没有做的很好，等新一版的公式识别模型训练好以后（大概6月初）应该会改善很多。

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

关于5.2更新中英文公式识别能力的疑惑 #48

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

关于5.2更新中英文公式识别能力的疑惑 #48

ManiiXu May 6, 2024

Replies: 2 comments · 1 reply

OleehyO May 6, 2024 Maintainer

OleehyO Jun 7, 2024 Maintainer

OleehyO May 6, 2024 Maintainer

ManiiXu
May 6, 2024

Replies: 2 comments 1 reply

OleehyO
May 6, 2024
Maintainer

OleehyO Jun 7, 2024
Maintainer

OleehyO
May 6, 2024
Maintainer