Skip to content

mohammadzainabbas/SDM-Lab-3

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

SDM - Lab 3 @ UPC πŸ‘¨πŸ»β€πŸ’»



Table of contents


Data drives the world.

GraphDB is a graph database compliant with RDF and SPARQL specifications. It supports open APIs based on RDF4J (ex-Sesame) project and enables fast publishing of linked data on the web. The Workbench is used for searching, exploring and managing GraphDB semantic repositories.

You can find the documentation here

Apache Jena is an open source Java framework for building Semantic Web and Linked Data applications. It is quite powerful as it can be used to create ontologies, query and add constraints (via SHACL) in semantic web world.

For the purpose of this lab, we have used Apache Jena API to create TBOX and ABOX (and their links) for our publications' data.

You can find the documentation for Apache Jena API here

We have mentioned TBOX and ABOX but what are they ?

TBOX can be think as the meta-data for our knowledge graph (or semantic web data/linked data). It tells you what are the atomic concepts (Classes) are there and how they are linked to each other (Properties)

ABOX is the data instance layer. You create instance via triplets (subject predicate object) format. You basically tells which instance of data belongs to which atomic concept. And how it is linked to another instance of data.

So, when you have your TBOX and ABOX on top of Knowledge graph, you basically have Ontology. And you can unlock many amazing possibilities to query the data etc.


We used BYU Engineering Publications in Scopus 2017-21 publications' dataset available on Kaggle. You can find it here

Note: We renamed the file to publications.csv for ease of use.


In order to create correct topology (TBOX and ABOX), you may need to pre-process your data first. We wrote a python script which you can use to get the preprocessed data. Just run the following to get the instances_data.csv file.

git clone https://github.com/mohammadzainabbas/SDM-Lab-3.git
cd SDM-Lab-3/
python scripts/preprocess_publication_data.py

Run the following command to generate and save the TBOX:

sh scripts/build_n_run.sh tbox

Run the following command to generate and save the ABOX:

sh scripts/build_n_run.sh abox

After running the above mentioned commands, you should have these files under data directory:

data
β”œβ”€β”€ publications.owl
β”œβ”€β”€ publications_data.nt
└── raw
    β”œβ”€β”€ instances_data.csv
    └── publications.csv

Now, you can load publications.owl and publications_data.nt in GraphDB and start querying the data.