OCR-free Document Understanding with Donut Transformer

Overview

This project addresses the real case scenario of the automation of processing client information and documents for mortgage granting, aiming to optimize document classification, error detection, data extraction, and validation.

Problem

The goal is to enhance efficiency in mortgage granting by automating the identification and processing of documents provided by clients, both in physical offices and digital channels. The purpose is to reduce response times, enhance customer experience, and minimize operational costs and risks.

IA Team KPIs:

Precision: >=98% (This means 98% of the documents are processed correctly in all stages)
Automation: >=60% (60% of the documents are processed automatically without human intervention)
SLA: <2 hours

Objectives

Reduction in response times and improvement in customer experience.
Research in the field of Visual Document Understanding (VDU).
End-to-end automation with high precision and efficiency.
Selection and adaptation of suitable AI models.
Evaluation of results and project viability.

Methodology

The Donut architecture was adopted to address document classification and data extraction.

This model was proposed in OCR-free Document Understanding Transformer by Geewook Kim, Teakgyu Hong, Moonbin Yim, Jeongyeon Nam, Jinyoung Park, Jinyeong Yim, Wonseok Hwang, Sangdoo Yun, Dongyoon Han, Seunghyun Park. Donut consists of an image Transformer encoder and an autoregressive text Transformer decoder to perform document understanding tasks such as document image classification, form understanding and visual question answering.

The Donut model will be fine-tuned to address the document classification problem. For the data extraction challenge, inference will be applied directly to a pre-trained Donut model.

Results

The OCR-free Donut model proved to be efficient and precise for document comprehension, overcoming challenges associated with traditional OCR-based methods.

Metrics obtained:

Document classification: 98.63% precision and 62.43% automation.
Data extraction: 100.00% precision and 95.24% automation.

Note: These results come from experimental datasets and might need adjustment for datasets more representative of business reality. However, the current results are promising and suggest a viable practical application.

For more information about the project, please check the Project_Workflow.pdf.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
datasets		datasets
notebooks		notebooks
LICENSE		LICENSE
Project_Report.pdf		Project_Report.pdf
Project_Workflow.pdf		Project_Workflow.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OCR-free Document Understanding with Donut Transformer

Overview

Problem

IA Team KPIs:

Objectives

Methodology

Results

Metrics obtained:

About

Releases

Packages

Languages

License

javier-marti-isasi/OCR-free-Document-Understanding-with-Donut-Transformer

Folders and files

Latest commit

History

Repository files navigation

OCR-free Document Understanding with Donut Transformer

Overview

Problem

IA Team KPIs:

Objectives

Methodology

Results

Metrics obtained:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages