Skip to content

Enhance GROBID Output to Include Image Feature Vectors for Multimodal KG Construction #3

@tekrajchhetri

Description

@tekrajchhetri

Description

To support multimodal Knowledge Graph (KG) construction, we need to extend the current GROBID-based extraction pipeline to also incorporate image-derived features.

Important: GROBID does not provide native capabilities for image understanding or feature extraction. It focuses on structured text extraction only. Therefore, an additional OCR / vision processing layer is required to handle figures, diagrams, and embedded images.

This update will enable documents to be represented with both:

  • Textual features (from GROBID)
  • Visual features (from OCR / vision models)

Proposed Approach

Integrate an image understanding component alongside GROBID, potentially using:

This could enable:

  • Extraction of text from images (OCR)
  • Layout-aware features (tables, figures, regions)
  • Visual embeddings for figures/diagrams

Goals

  • Enable downstream consumers (KG pipeline) to reference both:
    • Textual nodes
    • Visual/multimodal nodes

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions