Skip to content

Latest commit

 

History

History
231 lines (190 loc) · 25.4 KB

README.md

File metadata and controls

231 lines (190 loc) · 25.4 KB

OWL 2 EL <-> Neo4J Mapping "Direct existentials"

This is a preliminary draft of our Neo4J to OWL 2 mapping. The goal is to be able to import a well defined subset of OWL 2 EL ontologies into and export them from Neo4J in such a way that entailments and annotations are preserved (not however the syntactic structure) in the ontology after the round-trip. The main differences of this mapping to other mappings (see References below) are

  • its treatment of blank nodes in existential restrictions. Rather than creating blank nodes, we create direct edges between entities, labelled with the property of the existential restriction. This makes querying the graph more intuitive.
  • its use of qualified safe labels for typing relations (which makes for easier querying)
  • taking advantage of OWL implications (rather than relying on pure RDF syntax). Rather than merely mapping an asserted axiom into Neo4J, we are interested in the following implied relationships:
    • Class-Class. For two class names A and B, an annotation property P, and an object property name R, we consider three types of relationships
      • SubClassOf restrictions of the form A SubClassOf: B
      • Existential restrictions of the form A SubClassOf: R some B
      • Annotation assertions of the form A Annotations: P B
    • Individual-Individual. For two individuals i and j, an annotation property P and an object property R, we consider
      • Object property assertions of the form i Facts: R j
      • Annotation assertions of the form i Annotations: P j
    • Class-Individual. For a class A, an individual i, an annotation property P and an object property name R, we consider three types of relationships
      • Class assertions of the form i Types: A
      • Existential restrictions of the form i Types: R some B
      • Annotation assertions of the form C Annotations: P i
    • Individual-Class. For a class A, an individual i, an annotation property P and an object property name R, we consider
      • Existential restrictions of the form A SubClassOf: R value i
      • Annotation assertions of the form i Annotations: P A

The most similar mapping to our is the one used by Monarch Initiatives SciGraph. The main differences are:

  • In SciGraph, IRIs are first class citizens everywhere, while we prioritise safe labels to make query construction easier. This is especially important for edge types: Instead of MATCH p=()-[r:http://purl.obolibrary.org/obo/BFO_0000050]->() RETURN p LIMIT 25, we prefer to say MATCH p=()-[r:part_of_obo]->() RETURN p LIMIT 25
  • Anonymous class patterns are kept alongside so called "convenience" edges in SciGraph, the latter of which correspond to the way we treat edges in general.

Some ideosyncracies of our approach are:

  • To be able to roundtrip, we create disconnected nodes in the Neo4J graph representing OWL properties so that we can represent metadata (such as labels or other annotations) pertaining to them.
  • We introduce a number of properties based on the notion of label for easier yet unambiguous querying, which are materialised on all nodes. qualified safe labels in particular are used to type relationships. The use of these is predicated on the assumption that, for any given namespace, labels are unique. This should be tested prior to loading. Given an entity e (Example: http://purl.obolibrary.org/obo/BFO_0000050),
    • ns corresponds to the namespace the relationship in question in question (Example: http://purl.obolibrary.org/obo/BFO_).
    • short_form corresponds to the remainder (or fragment) of the IRI of e. (Example: "BFO_0000050")
    • label corresponds to either
      • the first rdfs:label annotation encountered or, if there are no rdfs:label annotation,
      • the short_form, or, if there is no short_form,
      • the whole IRI (Example: "part of").
    • safe label (sl) is the label, with all non-alphanumeric characters being replaced by _. Trailing and heading underscores are removed, any sequence of underscores is replaced by a single underscore (Example: "part_of").
    • curie is the valid curie for an entity, i.e. the namespace and the short_form, separated by ":" (Example: "obo:BFO_0000050").
    • qualified safe label (qsl) is the safe label of an entity and its namespace, separated by _ (Example: "part_of_bfo").
  • An example use of an SL in Cypher is (:n)-[part_of_bfo]-(:x)
  • Individuals are currently only typed with their most direct type
  • We only support datatypes that are supported by both neo4j and OWL (other OWL2 datatypes will be cast to Neo4j:String, which means their typing information is lost in a round-trip):
Neo4J OWL 2 Comment
Integer xsd:integer
String xsd:string
Boolean xsd:boolean
Float xsd:float
list xsd:string Follow JSON standard for representing list as string?
  • For properties and axioms, all annotations are treated as if they were to literals (i.e. they wont be connected to other entities)

For readibility, we omit the neo4j2owl namespaces in the OWL 2 EL Axiom Column;

PREFIX: n2o: http://neo4j2owl.org/mapping# Prefix: : http://neo4j2owl.org/mapping#

Entities

All entities in the ontology, i.e. classes, individuals, object properties, data properties, and annotation properties are represented as nodes in the graph. Relationship nodes are only added to hold metadata about relations, and are disconnected from the rest of the graph. The iri, sl and short_form attributes on Neo4J nodes are the only three attributes that are not mapped into corresponding rdf statements.

Concept OWL 2 EL Axiom Neo4J Graph Pattern Comment
Class declaration Class: A (:Class {iri: 'http://neo4j2owl.org/mapping#A', short_form:'A', label:'L(A)'})
Individual declaration Individual: i (:Individual {iri: 'http://neo4j2owl.org/mapping#i', label:'L(A)'})
Annotation property declaration AnnotationProperty: R (:AnnotationProperty {iri: 'http://neo4j2owl.org/mapping#R', short_form:'R', sl:'SL(R)'}, label:'L(A)')
Object property declaration ObjectProperty: R (:ObjectProperty {iri: 'http://neo4j2owl.org/mapping#R', short_form:'R', sl:'SL(R)'}, label:'L(A)')
Data property declaration DataProperty: R (:DataProperty {iri: 'http://neo4j2owl.org/mapping#R', short_form:'R', sl:'SL(R)'}, label:'L(A)')

Class-Class relationships

Concept OWL 2 EL Axiom Neo4J Graph Pattern Comment
SubClassOf Class: A SubClassOf: B (:Class {.. short_form:'A'..})-[r:SubClassOf]-(:Class {.. short_form:'B'..})
Annotations on classes to other classes Class: A Annotations: R B (:Class {.. short_form:'A'..})-[r:SL(R)]-(:Class {.. short_form:'B'..})
Simple existential "class" restrictions on classes Class: A SubClassOf: R some B (:Class {.. short_form:'A'..})-[r:SL(R)]-(:Class {.. short_form:'B'..})

Class-Individual relationships

Concept OWL 2 EL Axiom Neo4J Graph Pattern Comment
Annotations on classes to individuals Class: A Annotations: R i (:Class {.. short_form:'A'..})-[r:SL(R)]-(:Individual {.. short_form:'i'..})
Simple existential "individual" restrictions on classes Class: A SubClassOf: R value i (:Class {.. short_form:'A'..})-[r:SL(R)]-(:Individual {.. short_form:'i'..})

Individual-Individual relationships

Concept OWL 2 EL Axiom Neo4J Graph Pattern Comment
Object Property Assertion Individual: i Facts: R j (:Individual {.. short_form:'i'..})-[r:SL(R)]-(:Individual {.. short_form:'j'..})
Annotations on individuals to other individuals Individual: i Annotations: R j (:Individual {.. short_form:'i'..})-[r:SL(R)]-(:Individual {.. short_form:'j'..})

Individual-Class relationships

Concept OWL 2 EL Axiom Neo4J Graph Pattern Comment
Class Assertion Individual: i Types: A (:Individual {.. short_form:'i'..})-[r:Types]-(:Class {.. short_form:'A'..})
Simple existential restriction on assertion Individual: i Types: R some A (:Individual {.. short_form:'i'..})-[r:SL(R)]-(:Class {.. short_form:'A'..})
Annotations on individuals to classes Individual: i Annotations: R A (:Individual {.. short_form:'i'..})-[r:SL(R)]-(:Class {.. short_form:'A'..})

Entity-literal relationships

For reasons of feasibility, data property assertions or restrictions, will be incomplete in almost any implementation. In our reference implementation, we only consider asserted data property assertions.

Concept OWL 2 EL Axiom Neo4J Graph Pattern Comment
Annotations on classes to literals Class: A Annotations: P "A"@en (:Class {..,SF(R):'"A"@en'})
Annotations on individuals to literals Individual: i Annotations: P "A"@en (:Individual {..,SF(R):'"A"@en'})
Annotations on object properties to literals ObjectProperty: R Annotations: P "A"@en (:ObjectProperty {..,SF(R):'"A"@en'})
Annotations on data properties to literals DataProperty: R Annotations: P "A"@en (:DataProperty {..,SF(R):'"A"@en'})
Annotations on annotation properties to literals AnnotationProperty: R Annotations: P "A"@en (:AnnotationProperty {..,SF(R):'"A"@en'})
Data property assertion Individual: A Facts: R 2 (:Individual {..,SF(R):2})
Data property restriction Class: A SubClassOf: R value 2 (:Class {..,SF(R):2})

Axiom annotations

Concept OWL 2 EL Axiom Neo4J Graph Pattern Comment
Axiom Annotations Class: A SubclassOf: Annotations: P "A"@en (:Class {..})-[r: {..SF(R):'"A"@en'}..]-()

Notes on mapping procedure

  • If we use curies to indicated edge-type for OPs, we need labels as an attribute.

Custom Neo4J properties

For edge types, node labels (in the Neo/Cypher sense of the term) and property keys, special characters and spaces can potentially be supported via the use of back-tick escaping, but avoiding them makes writing cypher much easier - especially if via script.

Related user stories (internal use only)

  • As a developer writing OWL from the KB, I want to be able to easily find the correct iri for all OWL entities from the database. => All nodes have an IRI. All edges have an IRI or have a key (e.g. short_form or Curie) that makes it easy to look up and IRI from the relevant node. The key may be a edge type name or an edge attribute (neo4J property key).
  • As a developer writing OWL from the KB, I want to be able to tell from all nodes and edges what type of OWL entity or axiom I should create. => There must be an unambiguous mapping from node:label (or some standard attribute) and edge types (or some standard attribute) to OWL entity and axiom types.
  • As a developer writing to the database, I need to know what identifiers it is safe to use to uniquely identify nodes for the purpose of merging in new content. External content is loaded from ontologies that we don't have complete control over. Uniqueness of rdfs:Label can not be relied upon (although it is typically a safe assumptions within the context of a single ontology). short_form uniqueness is much safer, but Curies would be safer still.
  • As a developer writing to the database, I want to be able to easily and unambiguouysly refer to existing entities. => Using full IRIs for this can be a big pain in the ass as, outside of OBO, these are hard to remember. Using curie's is somewhat easier, but can still be a pain. Using short_forms is easiest, but to do this safely requires a commitment to short_form uniqueness. While this cannot be absolutely guaranteed it is rare. => All nodes and edge-types have unique short_form or curie
  • As a developer editing the DB I want to be able to easily write queries to check what information it contains. I should be able to easily select major categories of content.
    I should be able to write queries using lexical identifiers (labels or something closely related to them), even if this is occasionally unreliable. I should be able to use lexical identifiers (a readable neo4j:label or node attribute) to filter/select on major categories of entity (e.g. anatomical entity; expression pattern; genetic feature).
  • As a Geppetto developer, I should be able to find the display names to use for edges and property keys without having to mung text or run secondary lookups. Edges should store rdfs:label; property keys should not be Qualified ?

Implementation

  • The neo4j2owl plugin for neo4j implements two procedures: exportOWL() and owl2Import(). The procedures are registered in N2OProcedure. The source code was losely based on neosemantics once, but has evolved in a completely different direction. Also, neosemantics has matured a ton since neo4j2owl was first developed and serves a much larger number of use cases than neo4j2owl does. The main rationale for neo4j2owl is its handling of OWL existential restrictions (see above) as a primary source for relationships and its ability to roundtrip a subset of OWL2EL (import-export-import without loss of information). Furthermore, it was important for our use case to handle human readable labels (relation types) on edges in a standardised way, as well as using OWL2 EL reasoning to automatically infer node labels. It is implemented using the OWLAPI.
    • The functionality for exportOWL() is implemented in the exporter java package.
    • The functionality for owl2Import() is implemented in the importer java package.
  • Overview of the exportOWL() process (src):
    1. Translate all nodes into OWL Entities. This requires OWL entity information (Class, AnnotationProperty, ObjectProperty etc) to be present on the nodes as neo4j labels.
    2. Translate all annotations on nodes into OWL2 AnnotationAssertion axioms.
    3. Translate all SUBLCASSOF, INSTANCEOF relations into respective OWL Axioms.
    4. Translate all neo relations which correspond to AnnotationAssertion axioms between entities
    5. Translate all neo relations that correspond to Existential restrictions or ObjectPropertyAssertions
    6. Translate all neo relations that correspond to DataProperty assertions axioms between entities (e.g. C sub )
    7. Render the ontology as RDFXML and return to user.
  • Overview of the owl2Import() process (src):
    1. Load the ontology using the OWLAPI (src)
    2. Importing the ontology (src)
      1. Create a reasoner (ELK) - the reasoner is used mainly to materialise the class hierarchy and individual types, as well as asserting dynamically configured labels (more about that later). Note that unsatisfiable classes are entirely ignored, due to the heavy reliance on reasoning during the process of importing!
      2. The ontology signature (OWLEntities like classes, individuals and object, data and annotation properties) is scanned and all entities imported into an internal structure (the actually import into neo4j is managed through CSV files).
      3. All all entities annotations are scanned and imported into an internal structure.
      4. All SubClassOf relations are imported into an internal structure with the help of a reasoner (this ensures that only the transitive reduct is imported).
      5. All ClassAssertion axioms are imported in the same way.
      6. All existential relations are imported into an external structure. Note that currently no reasoning is performed to look to compute the transtive reduct of the existential graph. This has to be done, if at all desired, directly on the ontology using a tool like ROBOT.
      7. Adding dynamic node labels. Based on an external configuration, labels can be defined as OWL class expression, which are now applied. For example, we can specify that all instances and subclasses of 'part of' some Body are labelled as Body part in neo4j.
    3. Export all imported data from the internal structures to CSV and write into the neo4j import directory. (src)
    4. Load all CSVs previously exported to the neo4j import directory into neo4j using LOAD CSV WITH HEADERS FROM (src).

Configuration of neo4j2owl

Configuration Example Default
allow_entities_without_labels:
If false and we are in strict mode (see safe_label), the ontology import fails hard. Else, it will make use of short form.
allow_entities_without_labels: true true
index:
Dont use.
OBSOLETE OBSOLETE
testmode:
To run unit tests during development (in IDE), this needs to be set to testmode: true, to enable working with an embedded neo4j.
testmode: false false
batch:
This setting was originally designed to allow importing an ontology using inline cypher queries, but turned out not to be feasible (using CSV bulk import now).
OBSOLETE OBSOLETE
safe_label:
The three central labelling strategies for edges. Three options: strict (use label, but if there are clashes, i.e two or more properties with the same label, fail.), qsl (use qualified safe labels with namespaces) and loose (same as strict, just not failing (only reporting clashes)).
safe_label: loose loose
batch_size:
Experimental feature that allows to chunk an input ontology (chunk size is number of axioms) and import the pieces one by one. Now measurable positive effect on performance.
batch_size: 999000000 999000000
relation_type_threshold:
Experimental feature that assigns a datatype based on majority voting. For example, if 95% of all values of a properties are integers, cast all values to integer.
relation_type_threshold: 0.95 0.95
property_mapping:
This setting does two things: Grouping different properties under the same label (edge type) and fixing the properties intended datatype.
property_mapping:
- iris:
- "http://purl.obolibrary.org/obo/so#part_of"
- "http://purl.obolibrary.org/obo/BFO_0000050"
id: part_of
- iris:
- "http://www.w3.org/2002/07/owl#deprecated"
id: deprecated
datatype: "Boolean"
}
{}
represent_values_and_annotations_as_json:
All properties listed under iris: are converted to JSON, which allows axiom annotations to be serialised.
represent_values_and_annotations_as_json:
iris:
- "http://purl.obolibrary.org/obo/IAO_0000115"
- "http://www.geneontology.org/formats/oboInOwl#hasExactSynonym"
- "http://www.geneontology.org/formats/oboInOwl#hasNarrowSynonym"
- "http://www.geneontology.org/formats/oboInOwl#hasBroadSynonym"
- "http://www.geneontology.org/formats/oboInOwl#hasRelatedSynonym"
{}
neo_node_labelling:
Allows labelling nodes dynamically based on them being subclasses or instances of the specified expression in the classes: section.
neo_node_labelling:
- label: Nervous_system
classes:
- RO:0002131 some FBbt:00005093
- FBbt:00005155
{}
curie_map:
Allows embedding curie maps to assign namespace prefixes to IRI prefixes.
curie_map:
VFBfbbt: http://purl.obolibrary.org/obo/fbbt/vfb/VFB_
{}
add_property_label:
If true, all property types (annotation, object and data properties) get
add_property_label: true true
timeout:
Overall timeout in minutes after which the import will be killed.
timeout: 180 180
preprocessing:
Allows injecting arbitrary Cypher queries that should be executed just before the import.
preprocessing:
-"CREATE INDEX ON :pub(short_form)"
-"CREATE INDEX ON :pub(short_form)"
[]

The VFB configuration file, as an example, can be found here.

References

Reference Explanation
And Now for Something Completely Different: Using OWL with Neo4j 2013 Blogpost on how OWL could be loaded into Neo. It provides a motivation for the conversion, and some code snippets to get started.
owl2lpg Working Draft of a "Mapping of OWL 2 Web Ontology Language to Labeled Property Graphs".
SciGraph OWL2Neo Preliminary mapping
SciGraph Neo2OWL Preliminary mapping
Convert OWL to labeled property graph and import into Neo4J Covers only class hierarchy/and annotations.
Neo4J-Jena Wrapper Provides the mapping of RDF to property graphs (Neo4J) using Jena API. Literals are represented as nodes.
Sail Ouplementation Interface to [access property graphs directly] as a triple store. No details on mappings in documentation.
Using Neo4J to load and query OWL ontologies 2009 blogpost with ad-hoc implementation, showing how to load the wine ontology with Jena and transform it into Neo4J. The mapping is intuitive and RDF-like, translating triples directly to nodes and edges.
[Importing RDF data into Neo4J](Using Neo4J to load and query OWL ontologies) 2016 blogpost defining a mapping proposition. Annotations are added on nodes if they are literatels, else nodes are created (good). Blank nodes are created. Neo4J stored procedure exists.
Building a semantic graph in Neo4j 2016 blogpost on how to define an RDFS style ontology in Neo4J. The author has a keen interest in RDF/OWL2Neo4J mappings, his entire blog seems to be mainly about that topic.
Neo4j is your RDF store (series) Interesting series describing how to use Neo as a triple store.
Storing and querying RDF in Neo4j Description of a SPARQL plugin to query Neo4J based on Sail Ouplementation.
Importing ttl (Turtle) ontologies in Neo4j 2013 blogpost with code on how to import ttl into neo using Sesame API
OLS: OWL to Neo4j schema Largely undocumented, but see slide 16 following of Berlin workshop. Implementing our mapping partially.

Editors notes:

  • Some notes on testing can be found here

Generation of safe labels:

  • alphanumeric and underscore allowed
  • other characters are encoded (for example, has part':@" results in has_part3563.

Understanding the log file

  • All cypher queries throughout the process will produce the same error messages when they fail: "Cypher query did NOT complete successfully (ERROR): ", followed by the cypher query that generated the error.
  • Most non-trivial cypher queries (especially the CSV imports) will generate a success message when they executed correctly: "Cypher finished successfully: ", followed by the cypher query.
  • Most errors thrown throughout the import process will result in the whole pipeline to fail, except:
    • If a dynamic label is not successfully added for whatever reason, the stack trace is printed along with a warning: "FAILED adding label " + label + " to " + ces + ", see logs."