🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based
-
Updated
Oct 13, 2023 - HTML
🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based
📜 Dehyphenation of broken text (mainly German), i.e., extracted from a PDF
📑 Python Package to reconstruct the original continuous text from PDFs with language models
Add a description, image, and links to the pd3f topic page so that developers can more easily learn about it.
To associate your repository with the pd3f topic, visit your repo's landing page and select "manage topics."