Enhance GROBID Output to Include Image Feature Vectors for Multimodal KG Construction

## Description
To support **multimodal Knowledge Graph (KG) construction**, we need to extend the current GROBID-based extraction pipeline to also incorporate **image-derived features**.

**Important:** GROBID does **not** provide native capabilities for image understanding or feature extraction. It focuses on structured text extraction only. Therefore, an additional **OCR / vision processing layer** is required to handle figures, diagrams, and embedded images.

This update will enable documents to be represented with both:
- Textual features (from GROBID)
- Visual features (from OCR / vision models)

## Proposed Approach
Integrate an image understanding component alongside GROBID, potentially using:

- PaddleOCR (text + layout + visual features):  
  - https://github.com/PaddlePaddle/PaddleOCR  
  - https://www.paddleocr.ai  

This could enable:
- Extraction of text from images (OCR)
- Layout-aware features (tables, figures, regions)
- Visual embeddings for figures/diagrams

## Goals 
- Enable downstream consumers (KG pipeline) to reference both:
  - Textual nodes
  - Visual/multimodal nodes


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance GROBID Output to Include Image Feature Vectors for Multimodal KG Construction #3

Description

Proposed Approach

Goals

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Enhance GROBID Output to Include Image Feature Vectors for Multimodal KG Construction #3

Description

Description

Proposed Approach

Goals

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions