GitHub - mohammadzainabbas/SDM-Lab-3: Semantic Data Management - Knowledge Graph 📈

mohammadzainabbas / SDM-Lab-3 Public

Notifications You must be signed in to change notification settings
Fork 2
Star 0

Semantic Data Management - Knowledge Graph 📈

0 stars 2 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 186 Commits
.github/workflows		.github/workflows
data		data
docs		docs
extra		extra
queries		queries
queries_result		queries_result
scripts		scripts
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml

Repository files navigation

SDM - Lab 3 @ UPC 👨🏻‍💻

Table of contents

Introduction
- GraphDB
- Apache Jena
- Ontology
  - TBOX
  - ABOX
Dataset
Preprocess
Generate TBOX
Generate ABOX

1. Introduction

Data drives the world.

1.1. GraphDB

GraphDB is a graph database compliant with RDF and SPARQL specifications. It supports open APIs based on RDF4J (ex-Sesame) project and enables fast publishing of linked data on the web. The Workbench is used for searching, exploring and managing GraphDB semantic repositories.

You can find the documentation here

1.2. Apache Jena

Apache Jena is an open source Java framework for building Semantic Web and Linked Data applications. It is quite powerful as it can be used to create ontologies, query and add constraints (via SHACL) in semantic web world.

For the purpose of this lab, we have used Apache Jena API to create TBOX and ABOX (and their links) for our publications' data.

You can find the documentation for Apache Jena API here

1.3. Ontology

We have mentioned TBOX and ABOX but what are they ?

1.3.1 TBOX

TBOX can be think as the meta-data for our knowledge graph (or semantic web data/linked data). It tells you what are the atomic concepts (Classes) are there and how they are linked to each other (Properties)

1.3.2 ABOX

ABOX is the data instance layer. You create instance via triplets (subject predicate object) format. You basically tells which instance of data belongs to which atomic concept. And how it is linked to another instance of data.

So, when you have your TBOX and ABOX on top of Knowledge graph, you basically have Ontology. And you can unlock many amazing possibilities to query the data etc.

2. Dataset

We used BYU Engineering Publications in Scopus 2017-21 publications' dataset available on Kaggle. You can find it here

Note: We renamed the file to publications.csv for ease of use.

3. Preprocess

In order to create correct topology (TBOX and ABOX), you may need to pre-process your data first. We wrote a python script which you can use to get the preprocessed data. Just run the following to get the instances_data.csv file.

git clone https://github.com/mohammadzainabbas/SDM-Lab-3.git
cd SDM-Lab-3/
python scripts/preprocess_publication_data.py

4. Generate TBOX

Run the following command to generate and save the TBOX:

sh scripts/build_n_run.sh tbox

5. Generate ABOX

Run the following command to generate and save the ABOX:

sh scripts/build_n_run.sh abox

After running the above mentioned commands, you should have these files under data directory:

data
├── publications.owl
├── publications_data.nt
└── raw
    ├── instances_data.csv
    └── publications.csv

Now, you can load publications.owl and publications_data.nt in GraphDB and start querying the data.

About

Semantic Data Management - Knowledge Graph 📈

java rdf vscode owl ontology knowledge-graph graphdb bash-script rdfs apache-jena graphdb-workbench

Report repository

Releases

No releases published

Packages

No packages published

Contributors 3

Languages