generated from sensein/python-package-template
-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
Description
To support multimodal Knowledge Graph (KG) construction, we need to extend the current GROBID-based extraction pipeline to also incorporate image-derived features.
Important: GROBID does not provide native capabilities for image understanding or feature extraction. It focuses on structured text extraction only. Therefore, an additional OCR / vision processing layer is required to handle figures, diagrams, and embedded images.
This update will enable documents to be represented with both:
- Textual features (from GROBID)
- Visual features (from OCR / vision models)
Proposed Approach
Integrate an image understanding component alongside GROBID, potentially using:
- PaddleOCR (text + layout + visual features):
This could enable:
- Extraction of text from images (OCR)
- Layout-aware features (tables, figures, regions)
- Visual embeddings for figures/diagrams
Goals
- Enable downstream consumers (KG pipeline) to reference both:
- Textual nodes
- Visual/multimodal nodes
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels