`Doc2Graph`

This library is the implementation of the paper Doc2Graph: a Task Agnostic Document Understanding Framework based on Graph Neural Networks, accepted at TiE @ ECCV 2022.

The model and pipeline aims at being task-agnostic on the domain of Document Understanding. It is an ongoing project, these are the steps already achieved and the ones we would like to implement in the future:

Build a model based on GNNs to spot key-value relationships on forms
Publish the preliminary results and the code
Extend the framework to other document-related tasks
- Business documents Layout Analysis
- Table Detection
Let the user train Doc2Graph over private / other datasets using our dataloader
Publish Doc2Graph to PyPI for easy installation
Retrain Doc2Graph in a new and extended version

Quick Start

Get up and running with Doc2Graph in minutes:

# Clone and install
git clone https://github.com/andreagemelli/doc2graph.git
cd doc2graph
uv sync
uv pip install -e .

# Initialize the project (downloads datasets)
uv run doc2graph.main --init

# Run inference on a document
uv run python -m doc2graph.main -addG -addT -addE -addV --weights e2e-funsd-best.pt --inference --docs /path/to/your/image.png

Check out the tutorial notebook for a complete walkthrough!

Index:

Doc2Graph

News!

🔥 Added inference method: you can now use Doc2Graph directly on your documents simply passing a path to them!
This call will output an image with the connected entities and a json / dictionary with all the useful information you need! 🤗

uv run python -m doc2graph.main -addG -addT -addE -addV --weights e2e-funsd-best.pt --inference --docs /path/to/your/image.png

🔥 Added tutorial folder: get to know how to use Doc2Graph from the tutorial notebooks!

Environment Setup

Prerequisites

Python 3.8-3.11 (recommended: 3.10)
uv package manager

Installation

Install uv (if not already installed):

pip install uv

Clone and setup the project:

git clone https://github.com/andreagemelli/doc2graph.git
cd doc2graph

Install the package in development mode:

uv sync
uv pip install -e .

This will:

Install all dependencies with compatible versions
Install the doc2graph package in development mode
Set up the virtual environment with Python 3.10

Additional Setup (Optional)

For GPU acceleration, you may need to install CUDA-specific versions of PyTorch and DGL:

# For CUDA 11.8 (adjust version as needed)
uv add torch==1.13.1+cu118 torchvision==0.14.1+cu118 --index-url https://download.pytorch.org/whl/cu118
uv add dgl-cu118 --index-url https://data.dgl.ai/wheels/repo.html

Note: For different OS installations or CUDA versions, refer to PyTorch and DGL documentation.

Finally, create the project folder structure and download data:

python doc2graph.main --init

The script will download and setup:

FUNSD and the 'adjusted_annotations' for FUNSD¹ are given by the work of².
The yolo detection bbox described in the paper (If you want to use YOLOv5-small to detect entities, script in notebooks/YOLO.ipynb, refer to their github for the installation. Clone the repository into doc2graph/models/yolov5).
The Pau Riba's³ dataset with our train / test split.

Checkpoints You can download our model checkpoints here. Place them into doc2graph/models/checkpoints.

Training

To train our Doc2Graph model (using CPU) use:

python doc2graph.main [SETTINGS]

Instead, to test a trained Doc2Graph model (using GPU) [weights can be one or more file]:

python doc2graph.main [SETTINGS] --gpu 0 --test --weights *.pt

The project can be customized either changing directly configs/base.yaml file or providing these flags when calling doc2graph.main.

Features

--add-geom: bool (to add positional features to graph nodes)
--add-embs: bool (to add textual features to graph nodes)
--add-hist: bool (to add visual features to graph nodes)
--add-visual: bool (to add visual features to graph nodes)
--add-eweights: bool (to add polar relative coordinates between nodes to graph edges)

Data

--src-data: string [FUNSD, PAU or CUSTOM] (CUSTOM still under dev)
--src-type: string [img, pdf] (if src_data is CUSTOM, still under dev)

Graphs

--edge-type: string [fully, knn] (to change the kind of connectivity)
--node-granularity: string [gt, yolo, ocr] (choose the granularity of nodes to be used, gt (if given), ocr (words) or yolo (entities))
--num-polar-bins: int [Default 8] (number of bins into which discretize the space for edge polar features. It must be a power of 2)

Inference (only for KiE)

--inference: bool (run inference on given document/s path/s)
--docs: list (list your absolute path to your document)

Change directly configs/train.yaml for training settings or pass these flags to doc2graph.main. To create your own model (changing hyperparams) copy configs/models/*.yaml.

Training/Testing

--model: string [e2e, edge, gcn] (which model to use, which yaml file to load)
--gpu: int [Default -1] (which GPU to use. Set -1 to use CPU)
--test: true / false (skip training if true)
--weights: strin(s) (provide weight file(s) relative path(s), if testing)

Testing

You can use our pretrained models over the test sets of FUNSD¹ and Pau Riba's³ datasets.

On FUNSD we were able to perform both Semantic Entity Labeling and Entity Linking:

E2E-FUNSD-GT:

python doc2graph.main -addG -addT -addE -addV --gpu 0 --test --weights e2e-funsd-best.pt

E2E-FUNSD-YOLO:

python doc2graph.main -addG -addT -addE -addV --gpu 0 --test --weights e2e-funsd-best.pt --node-granularity yolo

on Pau Riba's dataset, we were able to perform both Layout Analysis and Table Detection

E2E-PAU:

python doc2graph.main -addG -addT -addE -addV --gpu 0 --test --weights e2e-pau-best.pt --src-data PAU --edge-type knn

Cite this project

If you want to use our code in your project(s), please cite us:

@InProceedings{10.1007/978-3-031-25069-9_22,
author="Gemelli, Andrea
and Biswas, Sanket
and Civitelli, Enrico
and Llad{\'o}s, Josep
and Marinai, Simone",
editor="Karlinsky, Leonid
and Michaeli, Tomer
and Nishino, Ko",
title="Doc2Graph: A Task Agnostic Document Understanding Framework Based on Graph Neural Networks",
booktitle="Computer Vision -- ECCV 2022 Workshops",
year="2023",
publisher="Springer Nature Switzerland",
address="Cham",
pages="329--344",
abstract="Geometric Deep Learning has recently attracted significant interest in a wide range of machine learning fields, including document analysis. The application of Graph Neural Networks (GNNs) has become crucial in various document-related tasks since they can unravel important structural patterns, fundamental in key information extraction processes. Previous works in the literature propose task-driven models and do not take into account the full power of graphs. We propose Doc2Graph, a task-agnostic document understanding framework based on a GNN model, to solve different tasks given different types of documents. We evaluated our approach on two challenging datasets for key information extraction in form understanding, invoice layout analysis and table detection. Our code is freely accessible on https://github.com/andreagemelli/doc2graph.",
isbn="978-3-031-25069-9"
}

⚠️ Security Notice

DGL Vulnerability: This project uses dgl>=1.1.3,<2.0.0 which has a known security vulnerability related to RPC pickle deserialization (GHSA-3x5x-fw77-g54c).

Workaround: When running DGL distributed training and inference (DistDGL), ensure you do not assign public IPs to any instance in the cluster.

Resolution: This vulnerability will be addressed in future versions of the project when a compatible DGL version with the security fix becomes available.

For more details, see the DGL security advisory.

G. Jaume et al., FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents, ICDARW 2019 ↩ ↩²
Hieu M. Vu et al., REVISING FUNSD DATASET FOR KEY-VALUE DETECTION IN DOCUMENT IMAGES, arXiv preprint 2020 ↩
P. Riba et al, Table Detection in Invoice Documents by Graph Neural Networks, ICDAR 2019 ↩ ↩²

Name		Name	Last commit message	Last commit date
Latest commit History 63 Commits
.github/workflows		.github/workflows
configs		configs
doc2graph		doc2graph
tutorial		tutorial
.gitignore		.gitignore
.python-version		.python-version
LICENCE		LICENCE
README.md		README.md
main.py		main.py
model.png		model.png
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

`Doc2Graph`

Quick Start

News!

Environment Setup

Prerequisites

Installation

Additional Setup (Optional)

Training

Testing

Cite this project

⚠️ Security Notice

About

Uh oh!

Releases

Uh oh!

Contributors 2

Uh oh!

Languages

License

andreagemelli/doc2graph

Folders and files

Latest commit

History

Repository files navigation

Doc2Graph

Quick Start

News!

Environment Setup

Prerequisites

Installation

Additional Setup (Optional)

Training

Testing

Cite this project

⚠️ Security Notice

Footnotes

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Uh oh!

Contributors 2

Uh oh!

Languages

`Doc2Graph`