Kastor - Shape-based relation extraction framework

Kastor is a modular framework for extracting RDF triples from unstructured text using shape-aware SLMs (Small Language Models). By combining SHACL shape definitions, a distilled knowledge graph, and active fine-tuning, Kastor builds lightweight, task-specific extractors. It's ideal for applications in semantic web, knowledge graph construction, and structured data mining.

🚀 Quick Start

1. Clone and Setup

git clone https://github.com/datalogism/Kastor.git
cd Kastor
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

📁 Project Overview

Kastor/
├── corese/           # Corese RDF engine and knowledge base loader
├── kstor/            # Knowledge distillation and SHACL-based filtering
├── slm/              # Finetuning material
├── shapes/           # SHACL templates used for extraction
├── XP_results/       # Experimental outputs
├── doc/              # Documentation
├── img/              # Illustrations
└── README.md         # This file

🧠 How It Works

Knowledge Base init. — Initialize your KB with DBpedia data
Shape Definition — Describe your desired RDF structure in a SHACL shape file.
Knowledge Distillation — Filter and align text and RDF from a knowledge base using the SHACL shape.
Data Augmentation — Augment your knowledge base to ensure sufficient exposure of rare properties
SLM Training — Train a language model distilled and enrich models to learn text-to-RDF extractor
Light Active Learning — Use your models to create gold dataset
Testing & Inference — Use the trained model to extract RDF triples from new text

🛠 Requirements

Python >= 3.8
PyTorch
HuggingFace Transformers
RDFlib
Java 11+ (for Corese)

Install via pip install -r requirements.txt

✅ Best Practices

Use concise, complete SHACL definitions to improve distillation quality.
Visualize RDF outputs to validate structure.
Use active training for iterative improvement.
Pre-filter knowledge base to reduce noise.

📜 License

Kastor is released under the MIT License.

📬 Questions or Issues?

Open a GitHub issue or contact the maintainers via https://github.com/datalogism/Kastor

📝 Related publications

1- Kastor: Fine-Tuned Small Language Models for Shape-Based Active Relation Extraction [PUBLISHED]

🎉 Accepted at the Research Track of ESWC 2025

If you use the code or cite our work, please reference this one as follows :

@inproceedings{DBLP:conf/esws/RingwaldGFMA25,
  author       = {C{\'{e}}lian Ringwald and
                  Fabien Gandon and
                  Catherine Faron and
                  Franck Michel and
                  Hanna Abi Akl},
  editor       = {Edward Curry and
                  Maribel Acosta and
                  Mar{\'{\i}}a Poveda{-}Villal{\'{o}}n and
                  Marieke van Erp and
                  Adegboyega K. Ojo and
                  Katja Hose and
                  Cogan Shimizu and
                  Pasquale Lisena},
  title        = {Kastor: Fine-Tuned Small Language Models for Shape-Based Active Relation
                  Extraction},
  booktitle    = {The Semantic Web - 22nd European Semantic Web Conference, {ESWC} 2025,
                  Portoroz, Slovenia, June 1-5, 2025, Proceedings, Part {I}},
  series       = {Lecture Notes in Computer Science},
  volume       = {15718},
  pages        = {94--115},
  publisher    = {Springer},
  year         = {2025},
  url          = {https://doi.org/10.1007/978-3-031-94575-5\_6},
  doi          = {10.1007/978-3-031-94575-5\_6},
  timestamp    = {Tue, 10 Jun 2025 17:38:39 +0200},
  biburl       = {https://dblp.org/rec/conf/esws/RingwaldGFMA25.bib},
  bibsource    = {dblp computer science bibliography, https://dblp.org}
}

Associated material:

The resulting extractor could be tested using this notebook

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Kastor - Shape-based relation extraction framework

🚀 Quick Start

1. Clone and Setup

📁 Project Overview

🧠 How It Works

🛠 Requirements

✅ Best Practices

📜 License

📬 Questions or Issues?

📝 Related publications

1- Kastor: Fine-Tuned Small Language Models for Shape-Based Active Relation Extraction [PUBLISHED]

Associated material:

2- Overcoming the Generalization Limits of SLM Finetuning for Shape-Based Extraction of Datatype and Object Properties [UNDER-REVIEW]

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 226 Commits
.idea		.idea
XP_results		XP_results
corese		corese
doc		doc
img		img
kstor		kstor
shapes		shapes
slm		slm
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
Test_models.ipynb		Test_models.ipynb

License

Wimmics/Kastor

Folders and files

Latest commit

History

Repository files navigation

Kastor - Shape-based relation extraction framework

🚀 Quick Start

1. Clone and Setup

📁 Project Overview

🧠 How It Works

🛠 Requirements

✅ Best Practices

📜 License

📬 Questions or Issues?

📝 Related publications

1- Kastor: Fine-Tuned Small Language Models for Shape-Based Active Relation Extraction [PUBLISHED]

Associated material:

2- Overcoming the Generalization Limits of SLM Finetuning for Shape-Based Extraction of Datatype and Object Properties [UNDER-REVIEW]

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages