-
Notifications
You must be signed in to change notification settings - Fork 11
[OWL Parser] Mapping OWL Ontologies to ONDEX
The OWL Parser plug-in (included in the ONDEX Desktop and ONDEX command line applications), allows to load and map OWL ontologies, encoded as OWL files, into ONDEX, as knowledge networks. Currently OWL classes are mapped to ONDEX concepts, the ONDEX concept classes such concepts are instance of can be defined as constants (e.g., "Trait Ontology Term"), or can be mapped from the top-most classes in an OWL ontology (e.g., "GO Biological Process").
The plug-in can be configured and tailored to map the OWL modelling used in a specific ontology onto ONDEX. The purpose of the hereby document is mainly to describe how this can be done. We will assume some knowledge of Java and XML. We will also refer to the XML-based configuration format used by Spring Beans (see below).
The Java Code about the OWL Parser is based on the generic OWL parser library. As you can read from the Javadoc documentation, this is based on the idea that a data format (e.g., an XML file, or a CSV file) is scanned, possibly split into smaller fragments (e.g., XML nodes, CSV rows) and the result of such scanning operation passed to proper mappers, each one dealing with converting a fragment to an ONDEX entity, such as an ONDEX concept, concept class, or the accession of a concept.
The parser library is independent on the particular input formats where it can be be used, by means of format-specific extensions. This way, it factorises certain interfaces (e.g., the notion of scanners and mappers, the idea of a concept mapper) and the defaults for certain parsing/mapping procedures (e.g., the idea that in most cases a concept mapper is based on splitting a data fragments into units that maps to components like concept ID, accessions, data source; this is how the default concept mapper is structured). In particular, the generic library factorises document scanning strategies, e.g., scanning flat structures like the rows of a CSV, or tree structures, like the elements of an OWL file. For the moment, we have implemented the latter in the exploring mapper, which, as we will show below, is the base for one of the key components in the OWL parser.
The OWL parser is one example of how the generic parser can be used to implement an ONDEX parser and, as such, the way it is structured and can be configured is heavily influenced by the latter.
The Generic Parser Library is designed to be used with an IoC container for tasks like defining specific components to parse a file (e.g., the OWL mapper, a CSV mapper) and its format-specific parameters, examples:
- which columns in a CSV format must be mapped to concept names and what are the data sources for such names (hypothetical example, maybe we will rewrite the CSV parser using this approach)
- mapping the different ways OWL ontologies represent accessions using properties like
dcterms:identifier
, or more domain-specific properties likeoboInOwl:id
.
More specifically, we are already using
Spring Beans and its XML format to configure
the OWL parser with different OWL/OBO ontologies, as we show in the next sections. Since the components that
are configured in Spring XML files are instances of JavaBeans, you
can understand what you can put in such XML by looking at class interfaces. For instance, the bean
defaultOwlMapper
is based on the
OWLMapper JavaBean, hence the corresponding XML can have
<property name="rootsScanner">
as child element (getRootsScanner()
/setRootsScanner()
are public methods
of the parent class).
One thing to add about is that IDEs have nice tools to edit Spring files (for instance, see the Spring Tool Suite for Eclipse).
As said above, the OWL parser is based on the parser library, OWL-specific extensions of it, XML Spring Bean
files used to configure such parser components against the specific ontology that you need to import in ONDEX
(e.g., Gene Ontology, Trait Ontology). Each of such ontologies require an XML configuration file. This, in
turn might use the Spring <import>
element. Typically, an OWL/OBO ontology is
configured by means of a top file for the ontology, which imports the
obo_common_mappings.xml, which, in turn, leverages
default_mappings.xml. These should have an obvious meaning, details are given in
the follow.
An OWL parser is essentially an instance of the class OWLMapper JavaBean, which, in turn, is a subclass of
ExploringMapper. As you can read in its Javadoc, this parses an input document
through the exploration of a tree-like graph, starting from top/source nodes and following configured link
types. In the case of OWL, the process will start from top-level OWL classes in an rdfs:subClass
hierarchy
and will proceed toward leaf nodes.
Elements of the OWL mapper configuration are the concept class mapper and the concept mapper:
<!-- This is in default_mappings.xml -->
<bean id = "defaultOwlMapper" class = "net.sourceforge.ondex.parser.owl.OWLMapper">
<property name="conceptClassMapper" ref = "conceptClassMapper" />
<property name="conceptMapper" ref = "conceptMapper" />
<property name="rootsScanner" ref = "rootsScanner" />
...
</bean>
<!-- This essentially uses Jena to get all the OWL classes having no superclass but owl:Thing -->
<bean id = "rootsScanner" class = "net.sourceforge.ondex.parser.owl.OwlRootClassesScanner" />
...
defaultOwlMapper
is the bean configured in the default_mappings.xml
file. In downstream files this
configuration is inherited to define the bean named owlMapper
, which is the one the plugi-in picks up to
start parsing an OWL document. In Spring this inheritance is polymorphic: if you define your custom
conceptMapper
, this will override the default and will be the one assigned to the corresponding property
above, thanks to the fact it's named in the ref
attribute.
Both the owl:subClass
scanner and other scanners are configured in the OWL mapper by means of linkers, i.e.,
the linkers
property of OWLMapper
. For instance, this configuration follows suclasses and part-of relationships:
<!-- This is in default_mappings.xml -->
<bean id = "defaultOwlMapper">
...
<property name = "linkers">
<list>
<bean class = "net.sourceforge.ondex.parser.ExploringMapper.LinkerConfiguration">
<property name="scanner">
<!-- For each new owl:Class, follow rdfs:subClassOf -->
<bean class = "net.sourceforge.ondex.parser.owl.OWLSubClassScanner" />
</property>
<!-- And map the two ends of the rdfs:subClass relation using the is-a ONDEX relation
(details in the 'Miscellanea' section). As you can see, a linker is an entity that put together
a scanner of relation and the mapper that tells the parser the corresponding ONDEX relation to be used
for the links found by the scanner.
-->
<property name = "mapper" ref = "isaMapper" />
</bean>
</list>
</property>
</bean>
<!-- This is in your ontology-specific file -->
<!-- As said above, you should always define owlMapper and it's convenient to inherit from defaultOwlMapper -->
<bean id = "owlMapper" parent = "defaultOwlMapper">
...
<property name = "linkers">
<list merge = "true"><!-- the following elements will be added to the subClass linker -->
<ref bean = "partOfLinker" />
</list>
</property>
</bean>
<!--
This is in obo_common_mappings.xml
The scanner uses the specified OBO property as identifier of part-of and the mapper links concepts corresponding
to OWL classes at the ends of this property by means of the part-of relation (see below).
-->
<bean id = "partOfLinker" class = "net.sourceforge.ondex.parser.ExploringMapper.LinkerConfiguration">
<property name = "scanner">
<bean class = "net.sourceforge.ondex.parser.owl.OWLSomeScanner">
<property name="propertyIri" value="#{ns.iri ( 'obo:BFO_0000050' )}" /><!-- part of -->
</bean>
</property>
<property name = "mapper" ref = "partOfMapper" />
</bean>
<!--
This is in default_mappings.xml
This mapper maps a new pair of ONDEX concepts to an ONDEX relation, using a constant relation type, which of
details are setup here.
-->
<bean id = "partOfMapper" class = "net.sourceforge.ondex.parser.SimpleRelationMapper">
<property name ="relationTypePrototype">
<bean class = "net.sourceforge.ondex.core.util.prototypes.RelationTypePrototype">
<property name="id" value = "part_of" />
<property name="fullName" value = "part of" />
<property name="antisymmetric" value = "true" />
<property name="transitive" value = "true" />
</bean>
</property>
</bean>
As you can see, beans can be flexibly configured and combined by using Spring mechanisms like inheritance, imported files, polymorphic references.
The explorer mapper has two ways to map roots elements to ONDEX, depending on how the property
doMapRootsToConcepts
.
The following, taken from the Trait Ontology configuration, is the typical case where you want this property to be true:
<bean id = "owlMapper" parent = "defaultOwlMapper" class = "net.sourceforge.ondex.parser.owl.OWLInfMapper">
<!-- The top classes to start from -->
<property name = "rootsScanner">
<bean class = "net.sourceforge.ondex.parser.owl.IriBasedRootsScanner">
<property name = "topClassIri" value = "#{ns.iri ( 'obo:TO_0000387' )}" /><!-- Plant Trait -->
</bean>
</property>
<!-- The root class above will be mapped to a concept, we use a generic 'Trait Ontology Concept' as a concept class -->
<property name = "doMapRootsToConcepts" value = "true" />
...
</bean>
<bean id = "conceptClassMapper" class = "net.sourceforge.ondex.parser.ConstantConceptClassMapper">
<property name = "value">
<bean class = "net.sourceforge.ondex.core.util.prototypes.ConceptClassPrototype">
<property name = "id" value = "TO_TERM" />
<property name = "fullName" value = "Trait Ontology Term" />
<property name= "description" value = "Term from the Trait Ontology (https://github.com/Planteome/plant-trait-ontology)" />
</bean>
</property>
</bean>
In practice, all the TO classes are mapped to ONDEX concepts, including 'Plant Trait', and all concepts will become instances of the ONDEX concept class 'Trait Ontology Term'. You typically want this when the root OWL classes in your ontology can be used as instances and are not conceptually separated from the rest of the ontology, i.e., a biological sample can be annotated with "Plant Trait".
A second way to map roots is when doMapRootsToConcepts
is false. This is an example from the
Gene Ontology configuration:
<bean id = "owlMapper" parent = "defaultOwlMapper">
<!-- The top OWL class maps to an ONDEX concept class and not to a concept -->
<property name = "doMapRootsToConcepts" value = "false" />
<property name = "rootsScanner">
<!-- Joins the OWL classes returned by the scanner belows, each returning a single class referred by its
configured URI -->
<bean class = "net.sourceforge.ondex.parser.CompositeScanner">
<property name = "scanners">
<set>
<bean class = "net.sourceforge.ondex.parser.owl.IriBasedRootsScanner">
<property name = "topClassIri" value = "#{ns.iri ( 'obo:GO_0008150' )}" /><!-- BioProcess -->
</bean>
<bean class = "net.sourceforge.ondex.parser.owl.IriBasedRootsScanner">
<property name = "topClassIri" value = "#{ns.iri ( 'obo:GO_0003674' )}" /><!-- MolFunction -->
</bean>
<bean class = "net.sourceforge.ondex.parser.owl.IriBasedRootsScanner">
<property name = "topClassIri" value = "#{ns.iri ( 'obo:GO_0005575' )}" /><!-- CellComp -->
</bean>
</set>
</property>
</bean>
</property>
...
</bean>
Here, the root classes are rarely used to directly instantiate annotations and are conceptually separated from their subclasses, i.e., 'surface binding' has an instantive nature with respect to 'biological process' and you rarely finds biological entities annotated with the latter directly. Hence, the result of the above mapping is to have concept classes like 'biological process' and no concept corresponding to biological processes.
OWLTopConceptClassMapper is the default concept class mapper that is used in such cases: this spawns a new concept class upon each new top class that the explorer mapper meets (i.e., when it meets classes like 'biological process') and then passes this concept class generated at the top of a subtree to all the subclasses that are reached from that top level (i.e., all the classes under the bio-process subtree become concepts that instantiate the bio-process concept class). This is repeated for each subtree (i.e., 'molecular function' and 'cellular compartment' subtrees receive their own concept classes).
WARNING: OWLTopConceptClassMapper
doesn't work very well with ontology having multiple inheritance (ONDEX
doesn't support that). If, let's say, A and B are two root classes in an ontology and there is some subclass
inheriting from both A and B (either directly or transitively), then, if this mapper is used, that class will be assigned to the concept class A or B arbitrarily. Consider the use of the constant concept class mapper (and doMapRootsToConcepts = true
) for such a case.
As you have seen above, the OWLMapper
is expected to be populated with a concept class mapper and a concept
mapper. These turn an owl:Class
in a .owl
file into respective ONDEX entities. Most of OWL classes are
interpreted as ONDEX concepts, a few might be mapped to concept classes, depending on the criteria configured
in the previous section.
We have already met an example of concept class mapper in the previous section, the
ConstantConceptClassMapper
used to define the 'trait ontology term'. Here it is another example, the already
mentioned OWLTopConceptClassMapper
:
<bean id = "conceptClassMapper" class = "net.sourceforge.ondex.parser.owl.OWLTopConceptClassMapper" scope="prototype">
<property name = "idMapper" ref = "idMapper" />
<property name = "fullNameMapper" ref = "nameMapper" />
<property name = "descriptionMapper" ref = "descriptionMapper" />
</bean>
As described above this is used on the top of a subtree, the RDF describing the subtree's root class (e.g.,
'Biological Process') is passed to the mapper, which, in turn, sends it to its component mappers (e.g.,
idMapper
, nameMapper
). The component mappers use the source RDF to gather the bit of information within their
competence (IDs, names, etc) and eventually return ONDEX entities based on it (eg, Java strings,
ConceptAccessions). At this point, the concept class mapper takes these ONDEX bits and builds
a new ONDEX concept class. The OWLMapper
uses such concept class whenever it needs, that is, whenever it has
to build a concept from a new owl:Class
it meets. The stateful behaviour of
OWLTopConceptClassMapper
ensures it generates a new concept class only the first time its mapping method is
invoked (i.e., upon the top class in a subtree).
A concept mapper (as many mappers in the parser architecture) works according to the same composing approach. This is the default available in the OWL parser:
<bean id = "conceptMapper" class = "net.sourceforge.ondex.parser.DefaultConceptMapper">
<property name = "idMapper" ref = "idMapper" />
<property name = "descriptionMapper" ref = "descriptionMapper" />
<property name = "preferredNameMapper" ref = "nameMapper" />
<property name = "accessionsMapper" ref = "accessionsMapper" />
<property name = "dataSourceMapper" ref = "dataSourceMapper" />
<property name = "altNamesMapper" ref = "altNamesMapper" />
</bean>
Internally, the DefaultConceptMapper
has a method that receives a concept class mapper, which is then used
for building new concept. The OWLMapper
invokes that method using the concept class mapper,
configured as explained above.
In the mappers described above there are many string mappers like descriptionMapper. They're configured like:
<!-- It's common in OBO to use this property for class description -->
<bean id = "descriptionMapper" class = "net.sourceforge.ondex.parser.owl.OWLTextMapper">
<property name="propertyIri" value = "#{ns.iri ( 'obo:IAO_0000115' )}" /><!-- definition -->
</bean>
the OWLTextMapper
receives an RDF node (e.g. the URI of an owl:Class
) and then extracts the value of a
literal OWL property attached to that node (obo:IAO_0000115
in this case). So, you might configure different
text mappers to extract different textual properties (e.g., rdfs:label
, rdfs:comment
). In the
default_mappings.xml
file this mechanism is used to define mappings for common OBO properties (e.g.,
oboInOwl:hasExactSynonym
).
Further mappers that you find in the default configuration files are:
- Accessions mapper, which provide accessions to populate an ONDEX concept. For OBO ontologies, this default configurations is provided:
<!-- Common accession mappers -->
<bean id = "idAccMapper" class = "net.sourceforge.ondex.parser.owl.OBOWLAccessionsMapper">
<property name = "propertyIri" value = "#{ns.iri ( 'oboInOwl:id' )}" />
<!-- You require these in your exension
<property name = "dataSourcePrefix" value="GO:" />
<property name = "dataSourcesMapper" ref = "goDataSourcesMapper" /> -->
</bean>
<bean id = "altIdAccMapper" class = "net.sourceforge.ondex.parser.owl.OBOWLAccessionsMapper">
<property name = "propertyIri" value = "#{ns.iri ( 'oboInOwl:hasAlternativeId' )}" />
<!-- You require these in your exension
<property name = "dataSourcePrefix" value="GO:" />
<property name = "dataSourcesMapper" ref = "goDataSourcesMapper" /> -->
</bean>
These beans map the properties oboInOwl:id
and oboInOwl:hasAlternativeId
. The dataSourcePrefix
and
dataSourcesMapper
parameters can be used to tailor the mapper to a specific ontology.
dataSourcePrefix
, if specified, is used to both filter values with a given prefix (e.g., GO:00002835
), and
to remove the prefix from the mapped value (e.g., only 00002835
is kept). addedPrefix
can be used to
replace (or just add) a prefix of your own (e.g., if it is set to GenOnt_
, by combining this with
dataSourcePrefix
, you eventually obtain GenOnt_00002835
).
The dataSourcesMapper
adds a data source to the accessions that are created by means of the above rules
(see details below).
In addition to overriding the default definitions, you can use the composite mapper, to map multiple properties of an OWL class to multiple types of accessions. This is an example from GO:
<bean id = "accessionsMapper" class = "net.sourceforge.ondex.parser.CompositeAccessionsMapper">
<property name = "mappers">
<set>
<bean parent = "idAccMapper">
<property name = "dataSourcePrefix" value="GO:" />
<property name = "dataSourcesMapper" ref = "goDataSourcesMapper" />
</bean>
<bean parent = "altIdAccMapper">
<property name = "dataSourcePrefix" value="GO:" />
<property name = "dataSourcesMapper" ref = "goDataSourcesMapper" />
</bean>
<ref bean = "wpXrefAccMapper" />
<ref bean = "enzymeXrefAccMapper" />
</set>
</property>
</bean>
As you can see, the base idAccMapper
is inherited to anchor the RDF accession property to oboInOwl:id
,
then a specific prefix and source are added. Moreover, the example shows how to use the CompositeAccessionsMapper
to join the results from multiple mappers and return them all to an invoker mapper. There are several composite components in the generic parser library that work this way and can be used wherever a single component of a given type (e.g., a mapper a scanner) is accepted that returns multiple results.
- Names mapper
Concept names are mapped through the OWLTextsMapper
, which is a multi-value version of OWLTextMapper
, described above. This is the default configuration for OBO ontologies:
<bean id = "altNamesMapper" class = "net.sourceforge.ondex.parser.owl.OWLTextsMapper">
<property name="propertyIri" value="#{ns.iri ( 'oboInOwl:hasExactSynonym' )}" />
</bean>
The obo:hasExactSynonym
is used, because, from the point of view of ONDEX, concept names is what such
property represents.
- Data source mappers
Data sources can be attached to several ONDEX entities and mappers (concepts, relations). The most common way to define a data source is by means of constants. For example, we have this for GO:
<bean id = "goDataSourcesMapper" class = "net.sourceforge.ondex.parser.ConstDataSourcesMapper">
<property name = "value">
<bean class = "net.sourceforge.ondex.core.util.prototypes.DataSourcePrototype">
<property name = "id" value = "GO" />
<property name = "fullName" value = "Gene Ontology" />
</bean>
</property>
</bean>
- Constants and ONDEX Prototypes
Prototypes are convenience classes to define ONDEX entities (e.g., data sources, relation types), starting
from constant value sets. For instance, the DataSourcePrototype just shown contains data that are used to
instantiate a data source, using some ONDEXGrap.
This mechanism is managed mostlu by means of the [CachedGraphWrapper](https://github.com/Rothamsted/ondex-base/blob/master/core/base/src/main/java/net/sourceforge/ondex/core/util/CachedGraphWrapper.java)
helper.
In addition to the cases shown above (definitions, accessions, names/synonyms), there are a couple of other
OBO-specific mappers, defined in obo_common_mappings.xml
, to deal with common cases in OBO-based ontologies:
mappers
-
Cross references, e.g.,
wpXrefAccMapper
to map Wikipedia references,pmedXrefAccMapper
to map PMID references. -
Intersection linker, this is used to map declarations of type:
A rdfs:subClass (B and C)
. The mapper is configured this way:
<bean id = "eqIntersctLinker" class = "net.sourceforge.ondex.parser.ExploringMapper.LinkerConfiguration">
<property name = "scanner">
<bean class = "net.sourceforge.ondex.parser.owl.OWLEqIntersctScanner" />
</property>
<property name = "mapper" ref = "isaMapper" />
</bean>
The
OWLEqIntersctScanner allows you to create relations like
A is-a B
, A is-a C
on the ONDEX side. As you can see, it is defined as linker, so this bean, or your own
extension, can be attached to an OWLMapper
, as shown in the previous sections.
- part-of and regulation relationships. These are defined in a similar way, using either the
OWLEqIntersctScanner
or theOWLSomeScanner
(already mentioned above).
This is currently used with the is-a relationship:
<bean id = "isaMapper" class = "net.sourceforge.ondex.parser.InvertingConceptRelMapper">
<property name = "baseMapper">
<bean class = "net.sourceforge.ondex.parser.SimpleRelationMapper">
<!-- Every relation created from this mapper has this relation type (the prototype just contains
constants to instantiate the relation type -->
<property
name ="relationTypePrototype"
value = "#{T( net.sourceforge.ondex.core.util.prototypes.RelationTypePrototype ).IS_A_PROTOTYPE }"/>
</bean>
</property>
</bean>
The baseMapper property maps two concepts (about OWL classes) to a relation with constant type. This mapper is
used by the OWLMapper
, which calls it in the top-down order, that is, if there is a relation like
A rdfs:subClassOf B
, it invokes this relation mapper passing B, A (because it traverses the OWL hierarchy
top-down). Without InvertingConceptRelMapper
, this invocation would produce B is-a A
, which is obviously
wrong and hence we fix the problem with the inverting relation mapper above. This just invokes its base relation mapper (i.e., the baseMapper
) reverting the parameters it receives.
You might have already noticed expressions like "#{ns.iri ( 'oboInOwl:hasExactSynonym' )}"
. This is
Spring Beans syntax. ns
is defined as
an instance of
NamespaceUtils
, which is an utility class to expand RDF/XML namespace-using
URIs to their full form. A number of common namespaces are defined
here and
here. As you can see, this is based on the
Java SPI mechanism.
You can add further namespaces in a Spring configuration (to be expanded later by meanse of the same SpEL syntax above) this way:
<bean class="org.springframework.beans.factory.config.MethodInvokingFactoryBean">
<property name="targetObject">
<ref local="ns"/>
</property>
<property name="targetMethod">
<value>registerNs</value>
</property>
<property name="arguments">
<list>
<value>ex</value>
<value>http://www.example.com/ex#</value>
</list>
</property>
</bean>
TODO: I've never tested this approach.
TO ONDEX developers: this is the purely Spring-based way. An alternative is adding the default namespaces in the OWL parser.
In the default Spring configuration file you find:
<bean id = "jenaOntModel" class = "org.apache.jena.rdf.model.ModelFactory"
scope = "prototype"
factory-method="createOntologyModel">
<constructor-arg value="#{T( org.apache.jena.ontology.OntModelSpec ).OWL_MEM }"></constructor-arg>
<property name = "strictMode" value = "false" />
</bean>
This is the Jena ontology container that is
initially created empty and then populated with the contents from the OWL file that you pass to the OWL parser
plug-in. This a basic Jena mechanism, which allows for configuration of features about the RDF view it
manages. For instance, in this default configuration, we disable all forms of automatic reasoning (by using
OWL_MEM) and we avoid certain validations, such as missing imported files (via strictMode = false
, usually there
is ONDEX-enough information in the file being parsed, without need to lookup the imports). Obviously you can change the Jena model that is used for parsing (and its behaviour) by redefining this bean in your own configuration. However, you should be aware of possible performance issues (e.g., reasoners are nice, but often too slow).
RDF Exporter
Neo4j Exporter
New Tab/CSV Importer
BK-Net Ontology
rdf2neo tool for RDF->Neo4j