Skip to content

GENKGB-530#464

Open
AmirLayegh wants to merge 4 commits intomainfrom
amir/schema-extraction-properties
Open

GENKGB-530#464
AmirLayegh wants to merge 4 commits intomainfrom
amir/schema-extraction-properties

Conversation

@AmirLayegh
Copy link
Contributor

Description

Updated the schema extraction prompt in SchemaExtractionTemplate to improve property extraction for node types. The changes ensure that:

  • All node types have at least one property (no empty properties should be rare)
  • Properties are extracted from the input text rather than inferred
  • Node types without properties are still included with additional_properties: true when properties cannot be identified

This addresses the issue where many node types were generated without properties, making schemas less useful for downstream processing.

Type of Change

  • New feature
  • Bug fix
  • Breaking change
  • Documentation update
  • Project configuration change

Complexity

Complexity: Low

How Has This Been Tested?

  • Unit tests
  • E2E tests
  • Manual tests

Testing Details:

  • Tested on GCP cloud environment with full evaluation pipeline with Gemini models on medical report documents
  • Verified that all generated node types now have properties
  • Compared results against ground truth schema

Checklist

The following requirements should have been met (depending on the changes in the branch):

  • Documentation has been updated
  • Unit tests have been updated
  • E2E tests have been updated
  • Examples have been updated
  • New files have copyright header
  • CLA (https://neo4j.com/developer/cla/) has been signed
  • CHANGELOG.md updated if appropriate

@AmirLayegh AmirLayegh requested a review from a team as a code owner January 22, 2026 12:46
@AmirLayegh AmirLayegh force-pushed the amir/schema-extraction-properties branch from edf8bbb to 00be32d Compare January 22, 2026 12:54
9.4 Properties that are supplementary information (phone numbers, descriptions, metadata) are typically optional.
9.5 When uncertain, default to "required": false.
9.6 If a property has a UNIQUENESS constraint, it MUST be marked as "required": true.
7. Also model intermediate events or actions (e.g., transactions, encounters, orders, events, reports) as separate node types when they are mentioned.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! have you noticed improvement with this additional instruction wrt events being extracted?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d say it helped in some cases, but we’re still missing several event-like entities (e.g., encounters, orders). It seems the instruction nudges the model, but coverage is still limited.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why "as separate node types"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent is to emphasize that events should be node types, not relationships. However, made it more explicit and removed the "separate node type".

Copy link
Contributor

@NathalieCharbel NathalieCharbel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants