This repository contains code that is used and presented as part of the paper Ontology Completion with Graph-based Machine Learning: A Comprehensive Evaluation, that can be found here.
To use the code, first download it to your computer. This can be done by running the following command.
git clone git@github.com:smeznar/anomaly-detection-in-ontologies.git
After this you need to setup the environment. To run the code we suggest using a Docker image provided in this repository. You can build the image by using the command
sudo docker build -t link-analysis .
from the root
folder. Note that the docker version uses CPU-only PyTorch.
The environment can also be set up manually. We suggest using python 3.6, as some dependencies need this version to work optimally. Using pip, dependencies can be satisfied by running the following commands from the root folder:
pip install -r requirements.txt
pip install torch==1.10.0+cpu -f https://download.pytorch.org/whl/cpu/torch_stable.html
pip install torch-scatter torch-sparse torch-cluster torch-spline-conv torch-geometric -f https://data.pyg.org/whl/torch-1.10.0+cpu.html
Your environment should now be ready, follow the instructions in the sections below for more information regarding the transformation of ontologies into a graph (and format used throughout all other parts of the repository), run link prediction, create recommendations for missing and redundant edges, and create explanations of the predicted recommendations.
To use the code in other sections, we first need to transform the ontology into a graph. Our code can read two different formats (JSON and txt). The JSON format is structured as:
{"graphs": {
"nodes":[{
"id": "string" ,
"type": "string",
"lbl": "string"}],
"edges":[{
"sub": "string",
"pred": "string",
"obj": "string"}]}}
where nodes is a list of nodes with id "id", type "type", and label "lbl", and edges are triplets where "sub" is the subject, "pred" predicate (relation), and "obj" object.
The txt file is formated as:
subject\t object\t predicate
Some examples of transformed ontologies (the ones used in the paper) can be found inside the data directory together with their
origin and sources. If you want to test our approach on your own ontology, you can transform an .owl
file into a JSON file by using the script src/conversion.py
as
python src/conversion.py --filename {filename} --out {out}
or
sudo docker run -v $(pwd):/app --rm link-analysis src/conversion.py --filename {filename} --out {out}
if you are using Docker. Argument {filename} is a placeholder for the path to the .owl file, while {out} represents the output path where .json file will be stored.
A knowledge graph can also be used with this approach, if transformed into a suitable format.
An overview of the link prediction methodology is presented in the image below.
Using this benchmark you should get the following results:
TBA: slika z rezultati
link prediction benchmark (5-fold cross-validation) can be run by using the command from the src directory:
python link_prediction.py --method {method} --dataset {data set} --format {format} --out {out}
where {method} is the baseline used, {dataset} is the directory of the dataset, {format} is the format type of the
dataset, and {out} is the directory where the results will be stored. We suggest path results/{filname}.txt
for
the {out} argument, especially when the Docker image is used.
If you are using the Docker image, the command should have the following form:
sudo docker run -v $(pwd):/app --rm link-analysis src/link_prediction.py --method {method} --dataset {data set} --format {format} --out {out}
Note that -v $(pwd):/app
in this command makes the folder of the repository (with all the code and data) visible
to the Docker image. Examples of command for Docker can be found inside the src/benchmark.sh
file.
By default, the following settings can be used:
- {method}: Adamic, Jaccard, Preferential, SNoRe, node2vec, Spectral, TransE, RotatE, GAT, GIN, GCN, GAE, metapath2vec
- {data set}: ../data/{d}.json, where {d} is one of anatomy, emotions, marine, scto, ehdaa, foodon, go, or ../data/LKN.txt
- {format}: json or txt, depending on the {dataset} file
You can also add your own method by adding it into the src/models.py file. An example of this can be found in the examples/new_method.py file. after doing this you must import the created class inside the src/link_prediction.py file as
from models import MyMethod
and add it to the methods dictionary in the line 15 of the src/link_prediction.py file (in the same way as the other methods are).
The overview of our approach for creating recommendations of missing and redundant edges is shown in the figure below.
An example of recommendation generation is shown in the examples/recommendation_generation.py
script. The example
generates top 20 recommendations for missing and redundant edges connected to node http://purl.obolibrary.org/obo/GO_0008150
in the Gene ontology. The recommendations are generated with SNoRe
.
TBA script for generating recommendations automatically
The overview of our approach for evaluating recommendation by using multiple versions of an ontology can be seen in the image below TBA
TBA
An example of a global explanation can be seen in the image below
A script for creating such explanation can be found in src/global_explanation.py
An example of a local explanation can be seen in the image below
A script for creating such explanation can be found in src/local_explanation.py
Code for transforming ontologies to json files was derived from KRR-Oxford's OWL2Vec-Star repository (version 0.2.0, last accessed: 11/2021).
To contribute, simply open an issue or a pull request!
Paper and the corresponding code were created by Sebastian Mežnar, Matej Bevec, Nada Lavrač, and Blaž Škrlj.
See LICENSE.md for more details.
Please cite as:
@misc{meznar2021link,
title={Link Analysis meets Ontologies: Are Embeddings the Answer?},
author={Sebastian Mežnar and Matej Bevec and Nada Lavrač and Blaž Škrlj},
year={2021},
eprint={2111.11710},
archivePrefix={arXiv},
primaryClass={cs.LG}
}