subimg_augmentation

This repository contains code for extracting polygonal segmentation data from ALTO XML files to use in subimage augmentation, as presented in "Evaluating Augmented Training Data for Complex Document Layouts: the Case of Arabic Scientific Manuscripts" (DH2024). The code is available both as a Python script (extract-regions.py) and a Jupyter notebook.

The method for creating artificial images using these extracted regions is the choice of the user. A sample workflow that combines together select regions using a SegmOnto ontology will soon be uploaded to this repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

subimg_augmentation

Files

README.md

Latest commit

History

README.md

File metadata and controls

subimg_augmentation