Skip to content

Latest commit

 

History

History
5 lines (3 loc) · 632 Bytes

README.md

File metadata and controls

5 lines (3 loc) · 632 Bytes

subimg_augmentation

This repository contains code for extracting polygonal segmentation data from ALTO XML files to use in subimage augmentation, as presented in "Evaluating Augmented Training Data for Complex Document Layouts: the Case of Arabic Scientific Manuscripts" (DH2024). The code is available both as a Python script (extract-regions.py) and a Jupyter notebook.

The method for creating artificial images using these extracted regions is the choice of the user. A sample workflow that combines together select regions using a SegmOnto ontology will soon be uploaded to this repository.