Data drives the world.
GraphDB is a graph database compliant with RDF and SPARQL specifications. It supports open APIs based on RDF4J (ex-Sesame) project and enables fast publishing of linked data on the web. The Workbench is used for searching, exploring and managing GraphDB semantic repositories.
You can find the documentation here
Apache Jena is an open source Java framework for building Semantic Web and Linked Data applications. It is quite powerful as it can be used to create ontologies, query and add constraints (via SHACL) in semantic web world.
For the purpose of this lab, we have used Apache Jena API to create TBOX and ABOX (and their links) for our publications' data.
You can find the documentation for Apache Jena API here
We have mentioned TBOX
and ABOX
but what are they ?
TBOX
can be think as the meta-data for our knowledge graph (or semantic web data/linked data). It tells you what are the atomic concepts (Classes) are there and how they are linked to each other (Properties)
ABOX
is the data instance layer. You create instance via triplets (subject
predicate
object
) format. You basically tells which instance of data belongs to which atomic concept. And how it is linked to another instance of data.
So, when you have your TBOX and ABOX on top of Knowledge graph, you basically have Ontology
. And you can unlock many amazing possibilities to query the data etc.
We used BYU Engineering Publications in Scopus 2017-21
publications' dataset available on Kaggle. You can find it here
Note: We renamed the file to
publications.csv
for ease of use.
In order to create correct topology (TBOX and ABOX), you may need to pre-process your data first. We wrote a python script which you can use to get the preprocessed data. Just run the following to get the instances_data.csv
file.
git clone https://github.com/mohammadzainabbas/SDM-Lab-3.git
cd SDM-Lab-3/
python scripts/preprocess_publication_data.py
Run the following command to generate and save the TBOX:
sh scripts/build_n_run.sh tbox
Run the following command to generate and save the ABOX:
sh scripts/build_n_run.sh abox
After running the above mentioned commands, you should have these files under data
directory:
data
βββ publications.owl
βββ publications_data.nt
βββ raw
βββ instances_data.csv
βββ publications.csv
Now, you can load publications.owl
and publications_data.nt
in GraphDB and start querying the data.