Skip to content

huridocs/pdf-reading-order

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

84 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PDF Reading Order

This tool returns the reading order of a PDF

Quick Start

Create venv:

make install_venv

Get the reading order of a PDF:

source venv/bin/activate
python src/predict.py /path/to/pdf

Train a new model

Get the labeled data tool from the GitHub repository:

https://github.com/huridocs/pdf-labeled-data

Change the paths in src/config.py

LABELED_DATA_ROOT_PATH = /path/to/pdf-labeled-data/project TRAINED_MODEL_PATH = /path/to/save/trained/model

Create venv:

make install_venv

Train a new model:

source venv/bin/activate
python src/create_candidate_finder_model.py
python src/create_reading_order_model.py

Use a custom model

python src/predict.py /path/to/pdf --model-path /path/to/model

Process figures and tables

python src/predict.py /path/to/pdf --extract-figures-and-tables

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published