Skip to content

[OWL Parser] Mapping OWL Ontologies to ONDEX

Marco Brandizi edited this page Mar 26, 2018 · 7 revisions

OWL Parser Plugin - OWL Mapping Guide

Introduction

The OWL Parser plug-in (included in the ONDEX Desktop and ONDEX command line applications), allows to load and map OWL ontologies, encoded as OWL files, into ONDEX, as knowledge networks. Currently OWL classes are mapped to ONDEX concepts, the ONDEX concept classes such concepts are instance of can be defined as constants (e.g., "Trait Ontology Term"), or can be mapped from the top-most classes in an OWL ontology (e.g., "GO Biological Process").

The plug-in can be configured and tailored to map the OWL modelling used in a specific ontology onto ONDEX. The purpose of the hereby document is mainly to describe how this can be done. We will assume some knowledge of Java and XML. We will also refer to the XML-based configuration format used by Spring Beans (see below).

The Generic Parser Library

The Java Code about the OWL Parser is based on the generic OWL parser library. As you can read from the Javadoc documentation, this is based on the idea that a data format (e.g., an XML file, or a CSV file) is scanned, possibly split into smaller fragments (e.g., XML nodes, CSV rows) and the result of such scanning operation passed to proper mappers, each one dealing with converting a fragment to an ONDEX entity, such as an ONDEX concept, concept class, or the accession of a concept.

The parser library is independent on the particular input formats where it can be be used, by means of format-specific extensions. This way, it factorises certain interfaces (e.g., the notion of scanners and mappers, the idea of a concept mapper) and the defaults for certain parsing/mapping procedures (e.g., the idea that in most cases a concept mapper is based on splitting a data fragments into units that maps to components like concept ID, accessions, data source; this is how the default concept mapper is structured). In particular, the generic library factorises document scanning strategies, e.g., scanning flat structures like the rows of a CSV, or tree structures, like the elements of an OWL file. For the moment, we have implemented the latter in the exploring mapper, which, as we will show below, is the base for one of the key components in the OWL parser.

The OWL parser is one example of how the generic parser can be used to implement an ONDEX parser and, as such, the way it is structured and can be configured is heavily influenced by the latter.

Spring-based Configuration

The Generic Parser Library is designed to be used with an IoC container for tasks like defining specific components to parse a file (e.g., the OWL mapper, a CSV mapper) and its format-specific parameters, examples:

  • which columns in a CSV format must be mapped to concept names and what are the data sources for such names (hypothetical example, maybe we will rewrite the CSV parser using this approach)
  • mapping the different ways OWL ontologies represent accessions using properties like dcterms:identifier, or more domain-specific properties like oboInOwl:id.

More specifically, we are already using Spring Beans and its XML format to configure the OWL parser with different OWL/OBO ontologies, as we show in the next sections. Since the components that are configured in Spring XML files are instances of JavaBeans, you can understand what you can put in such XML by looking at class interfaces. For instance, the bean defaultOwlMapper is based on the OWLMapper JavaBean, hence the corresponding XML can have <property name="rootsScanner"> as child element (getRootsScanner()/setRootsScanner() are public methods of the parent class).

One thing to add about is that IDEs have nice tools to edit Spring files (for instance, see the Spring Tool Suite for Eclipse).

The OWL Parser

As said above, the OWL parser is based on the parser library, OWL-specific extensions of it, XML Spring Bean files used to configure such parser components against the specific ontology that you need to import in ONDEX (e.g., Gene Ontology, Trait Ontology). Each of such ontologies require an XML configuration file. This, in turn might use the Spring <import> element. Typically, an OWL/OBO ontology is configured by means of a top file for the ontology, which imports the obo_common_mappings.xml, which, in turn, leverages default_mappings.xml. These should have an obvious meaning, details are given in the follow.

The entry point OWLMapper, a.k.a. the Exploring Mapper

An OWL parser is essentially an instance of the class OWLMapper JavaBean, which, in turn, is a subclass of ExploringMapper. As you can read in its Javadoc, this parses an input document through the exploration of a tree-like graph, starting from top/source nodes and following configured link types. In the case of OWL, the process will start from top-level OWL classes in an rdfs:subClass hierarchy and will proceed toward leaf nodes.

Elements of the OWL mapper configuration are the concept class mapper and the concept mapper:

<!-- This is in default_mappings.xml -->
<bean id = "defaultOwlMapper" class = "net.sourceforge.ondex.parser.owl.OWLMapper">
	<property name="conceptClassMapper" ref = "conceptClassMapper" />
	<property name="conceptMapper" ref = "conceptMapper" />
	<property name="rootsScanner" ref = "rootsScanner" />
	...
</bean>

<!-- This essentially uses Jena to get all the OWL classes having no superclass but owl:Thing -->
<bean id = "rootsScanner" class = "net.sourceforge.ondex.parser.owl.OwlRootClassesScanner" />

...

defaultOwlMapper is the bean configured in the default_mappings.xml file. In downstream files this configuration is inherited to define the bean named owlMapper, which is the one the plugi-in picks up to start parsing an OWL document. In Spring this inheritance is polymorphic: if you define your custom conceptMapper, this will override the default and will be the one assigned to the corresponding property above, thanks to the fact it's named in the ref attribute.

Linkers

Both the owl:subClass scanner and other scanners are configured in the OWL mapper by means of linkers, i.e., the linkers property of OWLMapper. For instance, this configuration follows suclasses and part-of relationships:

<!-- This is in default_mappings.xml -->
<bean id = "defaultOwlMapper">
	...
	<property name = "linkers">
		<list>
			<bean class = "net.sourceforge.ondex.parser.ExploringMapper.LinkerConfiguration">
				<property name="scanner">
				  <!-- For each new owl:Class, follow rdfs:subClassOf -->
					<bean class = "net.sourceforge.ondex.parser.owl.OWLSubClassScanner" />
				</property>				
				<!-- And map the two ends of the rdfs:subClass relation using the is-a ONDEX relation
				     (details in the 'Miscellanea' section). As you can see, a linker is an entity that put together
				     a scanner of relation and the mapper that tells the parser the corresponding ONDEX relation to be used
				     for the links found by the scanner.
				-->
				<property name = "mapper" ref = "isaMapper" />
			</bean>
		</list>
	</property>
</bean>


<!-- This is in your ontology-specific file -->
<!-- As said above, you should always define owlMapper and it's convenient to inherit from defaultOwlMapper -->
<bean id = "owlMapper" parent = "defaultOwlMapper">
	...
	<property name = "linkers">
		<list merge = "true"><!-- the following elements will be added to the subClass linker -->
			<ref bean = "partOfLinker" />			
		</list>
	</property>
</bean>


<!--
  This is in obo_common_mappings.xml
  The scanner uses the specified OBO property as identifier of part-of and the mapper links concepts corresponding
  to OWL classes at the ends of this property by means of the part-of relation (see below).
-->
<bean id = "partOfLinker" class = "net.sourceforge.ondex.parser.ExploringMapper.LinkerConfiguration">
	<property name = "scanner">
		<bean class = "net.sourceforge.ondex.parser.owl.OWLSomeScanner">
			<property name="propertyIri" value="#{ns.iri ( 'obo:BFO_0000050' )}" /><!-- part of -->
		</bean>
	</property>
	<property name = "mapper" ref = "partOfMapper" />
</bean>


<!--
  This is in default_mappings.xml
  This mapper maps a new pair of ONDEX concepts to an ONDEX relation, using a constant relation type, which of
  details are setup here.
-->
<bean id = "partOfMapper" class = "net.sourceforge.ondex.parser.SimpleRelationMapper">
	<property name ="relationTypePrototype">
		<bean class = "net.sourceforge.ondex.core.util.prototypes.RelationTypePrototype">
			<property name="id" value = "part_of" />
			<property name="fullName" value = "part of" />
			<property name="antisymmetric" value = "true" />
			<property name="transitive" value = "true" />
		</bean>
	</property>
</bean>

As you can see, beans can be flexibly configured and combined by using Spring mechanisms like inheritance, imported files, polymorphic references.

Roots in the OWLMapper

The explorer mapper has two ways to map roots elements to ONDEX, depending on how the property doMapRootsToConcepts.

The following, taken from the Trait Ontology configuration, is the typical case where you want this property to be true:

<bean id = "owlMapper" parent = "defaultOwlMapper" class = "net.sourceforge.ondex.parser.owl.OWLInfMapper">

	<!-- The top classes to start from -->
	<property name = "rootsScanner">
  	<bean class = "net.sourceforge.ondex.parser.owl.IriBasedRootsScanner">
  		<property name = "topClassIri" value = "#{ns.iri ( 'obo:TO_0000387' )}" /><!-- Plant Trait -->
  	</bean>
	</property>

	<!-- The root class above will be mapped to a concept, we use a generic 'Trait Ontology Concept' as a concept class -->		
	<property name = "doMapRootsToConcepts" value = "true" />

	...
</bean>


<bean id = "conceptClassMapper" class = "net.sourceforge.ondex.parser.ConstantConceptClassMapper">
	<property name = "value">
		<bean class = "net.sourceforge.ondex.core.util.prototypes.ConceptClassPrototype">
			<property name = "id" value = "TO_TERM" />
			<property name = "fullName" value = "Trait Ontology Term" />
			<property name= "description" value = "Term from the Trait Ontology (https://github.com/Planteome/plant-trait-ontology)" />
		</bean>
	</property>
</bean>

In practice, all the TO classes are mapped to ONDEX concepts, including 'Plant Trait', and all concepts will become instances of the ONDEX concept class 'Trait Ontology Term'. You typically want this when the root OWL classes in your ontology can be used as instances and are not conceptually separated from the rest of the ontology, i.e., a biological sample can be annotated with "Plant Trait".

A second way to map roots is when doMapRootsToConcepts is false. This is an example from the Gene Ontology configuration:

<bean id = "owlMapper" parent = "defaultOwlMapper">

  <!-- The top OWL class maps to an ONDEX concept class and not to a concept -->
  <property name = "doMapRootsToConcepts" value = "false" />

	<property name = "rootsScanner">
    <!-- Joins the OWL classes returned by the scanner belows, each returning a single class referred by its
         configured URI -->
		<bean class = "net.sourceforge.ondex.parser.CompositeScanner">
			<property name = "scanners">
				<set>
					<bean class = "net.sourceforge.ondex.parser.owl.IriBasedRootsScanner">
						<property name = "topClassIri" value = "#{ns.iri ( 'obo:GO_0008150' )}" /><!-- BioProcess -->
					</bean>
					<bean class = "net.sourceforge.ondex.parser.owl.IriBasedRootsScanner">
						<property name = "topClassIri" value = "#{ns.iri ( 'obo:GO_0003674' )}" /><!-- MolFunction -->
					</bean>
					<bean class = "net.sourceforge.ondex.parser.owl.IriBasedRootsScanner">
						<property name = "topClassIri" value = "#{ns.iri ( 'obo:GO_0005575' )}" /><!-- CellComp -->
					</bean>
				</set>					
			</property>
		</bean>
	</property>
  ...
</bean>

Here, the root classes are rarely used to directly instantiate annotations and are conceptually separated from their subclasses, i.e., 'surface binding' has an instantive nature with respect to 'biological process' and you rarely finds biological entities annotated with the latter directly. Hence, the result of the above mapping is to have concept classes like 'biological process' and no concept corresponding to biological processes.

OWLTopConceptClassMapper is the default concept class mapper that is used in such cases: this spawns a new concept class upon each new top class that the explorer mapper meets (i.e., when it meets classes like 'biological process') and then passes this concept class generated at the top of a subtree to all the subclasses that are reached from that top level (i.e., all the classes under the bio-process subtree become concepts that instantiate the bio-process concept class). This is repeated for each subtree (i.e., 'molecular function' and 'cellular compartment' subtrees receive their own concept classes).

WARNING: OWLTopConceptClassMapper doesn't work very well with ontology having multiple inheritance (ONDEX doesn't support that). If, let's say, A and B are two root classes in an ontology and there is some subclass inheriting from both A and B (either directly or transitively), then, if this mapper is used, that class will be assigned to the concept class A or B arbitrarily. Consider the use of the constant concept class mapper (and doMapRootsToConcepts = true) for such a case.

Concept and Concept Class mappers

As you have seen above, the OWLMapper is expected to be populated with a concept class mapper and a concept mapper. These turn an owl:Class in a .owl file into respective ONDEX entities. Most of OWL classes are interpreted as ONDEX concepts, a few might be mapped to concept classes, depending on the criteria configured in the previous section.

We have already met an example of concept class mapper in the previous section, the ConstantConceptClassMapper used to define the 'trait ontology term'. Here it is another example, the already mentioned OWLTopConceptClassMapper:

<bean id = "conceptClassMapper" class = "net.sourceforge.ondex.parser.owl.OWLTopConceptClassMapper" scope="prototype">
  <property name = "idMapper" ref = "idMapper" />
  <property name = "fullNameMapper" ref = "nameMapper" />
  <property name = "descriptionMapper" ref = "descriptionMapper" />
</bean>

As described above this is used on the top of a subtree, the RDF describing the subtree's root class (e.g., 'Biological Process') is passed to the mapper, which, in turn, sends it to its component mappers (e.g., idMapper, nameMapper). The component mappers use the source RDF to gather the bit of information within their competence (IDs, names, etc) and eventually return ONDEX entities based on it (eg, Java strings, ConceptAccessions). At this point, the concept class mapper takes these ONDEX bits and builds a new ONDEX concept class. The OWLMapper uses such concept class whenever it needs, that is, whenever it has to build a concept from a new owl:Class it meets. The stateful behaviour of OWLTopConceptClassMapper ensures it generates a new concept class only the first time its mapping method is invoked (i.e., upon the top class in a subtree).

A concept mapper (as many mappers in the parser architecture) works according to the same composing approach. This is the default available in the OWL parser:

<bean id = "conceptMapper" class = "net.sourceforge.ondex.parser.DefaultConceptMapper">
  <property name = "idMapper" ref = "idMapper" />
  <property name = "descriptionMapper" ref = "descriptionMapper" />	  
  <property name = "preferredNameMapper" ref = "nameMapper" />
  <property name = "accessionsMapper" ref = "accessionsMapper" />
  <property name = "dataSourceMapper" ref = "dataSourceMapper" />
  <property name = "altNamesMapper" ref = "altNamesMapper" />
</bean>

Internally, the DefaultConceptMapper has a method that receives a concept class mapper, which is then used for building new concept. The OWLMapper invokes that method using the concept class mapper, configured as explained above.

Text properties

In the mappers described above there are many string mappers like descriptionMapper. They're configured like:

<!-- It's common in OBO to use this property for class description -->
<bean id = "descriptionMapper" class = "net.sourceforge.ondex.parser.owl.OWLTextMapper">
  <property name="propertyIri" value = "#{ns.iri ( 'obo:IAO_0000115' )}" /><!-- definition -->
</bean>

the OWLTextMapper receives an RDF node (e.g. the URI of an owl:Class) and then extracts the value of a literal OWL property attached to that node (obo:IAO_0000115 in this case). So, you might configure different text mappers to extract different textual properties (e.g., rdfs:label, rdfs:comment). In the default_mappings.xml file this mechanism is used to define mappings for common OBO properties (e.g., oboInOwl:hasExactSynonym).

Accessions, Names, Data Sources

Further mappers that you find in the default configuration files are:

  • Accessions mapper, which provide accessions to populate an ONDEX concept. For OBO ontologies, this default configurations is provided:
<!-- Common accession mappers -->
<bean id = "idAccMapper" class = "net.sourceforge.ondex.parser.owl.OBOWLAccessionsMapper">
  <property name = "propertyIri" value = "#{ns.iri ( 'oboInOwl:id' )}" />
  <!-- You require these in your exension
  <property name = "dataSourcePrefix" value="GO:" />
  <property name = "dataSourcesMapper" ref = "goDataSourcesMapper" /> -->
</bean>

<bean id = "altIdAccMapper" class = "net.sourceforge.ondex.parser.owl.OBOWLAccessionsMapper">
  <property name = "propertyIri" value = "#{ns.iri ( 'oboInOwl:hasAlternativeId' )}" />
  <!-- You require these in your exension
  <property name = "dataSourcePrefix" value="GO:" />
  <property name = "dataSourcesMapper" ref = "goDataSourcesMapper" /> -->
</bean>

These beans map the properties oboInOwl:id and oboInOwl:hasAlternativeId. The dataSourcePrefix and dataSourcesMapper parameters can be used to tailor the mapper to a specific ontology.

dataSourcePrefix, if specified, is used to both filter values with a given prefix (e.g., GO:00002835), and to remove the prefix from the mapped value (e.g., only 00002835 is kept). addedPrefix can be used to replace (or just add) a prefix of your own (e.g., if it is set to GenOnt_, by combining this with dataSourcePrefix, you eventually obtain GenOnt_00002835).

The dataSourcesMapper adds a data source to the accessions that are created by means of the above rules (see details below).

In addition to overriding the default definitions, you can use the composite mapper, to map multiple properties of an OWL class to multiple types of accessions. This is an example from GO:

<bean id = "accessionsMapper" class = "net.sourceforge.ondex.parser.CompositeAccessionsMapper">
  <property name = "mappers">
    <set>
      <bean parent = "idAccMapper">
        <property name = "dataSourcePrefix" value="GO:" />
        <property name = "dataSourcesMapper" ref = "goDataSourcesMapper" />
      </bean>
      <bean parent = "altIdAccMapper">
        <property name = "dataSourcePrefix" value="GO:" />
        <property name = "dataSourcesMapper" ref = "goDataSourcesMapper" />
      </bean>
      <ref bean = "wpXrefAccMapper" />
      <ref bean = "enzymeXrefAccMapper" />				
    </set>
  </property>
</bean>

As you can see, the base idAccMapper is inherited to anchor the RDF accession property to oboInOwl:id, then a specific prefix and source are added. Moreover, the example shows how to use the CompositeAccessionsMapper to join the results from multiple mappers and return them all to an invoker mapper. There are several composite components in the generic parser library that work this way and can be used wherever a single component of a given type (e.g., a mapper a scanner) is accepted that returns multiple results.

  • Names mapper

Concept names are mapped through the OWLTextsMapper, which is a multi-value version of OWLTextMapper, described above. This is the default configuration for OBO ontologies:

<bean id = "altNamesMapper" class = "net.sourceforge.ondex.parser.owl.OWLTextsMapper">
  <property name="propertyIri" value="#{ns.iri ( 'oboInOwl:hasExactSynonym' )}" />
</bean>

The obo:hasExactSynonym is used, because, from the point of view of ONDEX, concept names is what such property represents.

  • Data source mappers

Data sources can be attached to several ONDEX entities and mappers (concepts, relations). The most common way to define a data source is by means of constants. For example, we have this for GO:

<bean id = "goDataSourcesMapper" class = "net.sourceforge.ondex.parser.ConstDataSourcesMapper">
  <property name = "value">
    <bean class = "net.sourceforge.ondex.core.util.prototypes.DataSourcePrototype">
      <property name = "id" value = "GO" />
      <property name = "fullName" value = "Gene Ontology" />
    </bean>
  </property>
</bean>		
  • Constants and ONDEX Prototypes

Prototypes are convenience classes to define ONDEX entities (e.g., data sources, relation types), starting from constant value sets. For instance, the DataSourcePrototype just shown contains data that are used to instantiate a data source, using some ONDEXGrap. This mechanism is managed mostlu by means of the [CachedGraphWrapper](https://github.com/Rothamsted/ondex-base/blob/master/core/base/src/main/java/net/sourceforge/ondex/core/util/CachedGraphWrapper.java) helper.

Other OBO common mappers

In addition to the cases shown above (definitions, accessions, names/synonyms), there are a couple of other OBO-specific mappers, defined in obo_common_mappings.xml, to deal with common cases in OBO-based ontologies: mappers

  • Cross references, e.g., wpXrefAccMapper to map Wikipedia references, pmedXrefAccMapper to map PMID references.

  • Intersection linker, this is used to map declarations of type: A rdfs:subClass (B and C). The mapper is configured this way:

<bean id = "eqIntersctLinker" class = "net.sourceforge.ondex.parser.ExploringMapper.LinkerConfiguration">
  <property name = "scanner">
    <bean class = "net.sourceforge.ondex.parser.owl.OWLEqIntersctScanner" />
  </property>
  <property name = "mapper" ref = "isaMapper" />
</bean>

The OWLEqIntersctScanner allows you to create relations like A is-a B, A is-a C on the ONDEX side. As you can see, it is defined as linker, so this bean, or your own extension, can be attached to an OWLMapper, as shown in the previous sections.

  • part-of and regulation relationships. These are defined in a similar way, using either the OWLEqIntersctScanner or the OWLSomeScanner (already mentioned above).

Miscellanea

Relation Inverter

This is currently used with the is-a relationship:

<bean id = "isaMapper" class = "net.sourceforge.ondex.parser.InvertingConceptRelMapper">
  <property name = "baseMapper">
    <bean class = "net.sourceforge.ondex.parser.SimpleRelationMapper">
      <!-- Every relation created from this mapper has this relation type (the prototype just contains
           constants to instantiate the relation type -->
      <property
        name ="relationTypePrototype"
        value = "#{T( net.sourceforge.ondex.core.util.prototypes.RelationTypePrototype ).IS_A_PROTOTYPE }"/>
    </bean>
  </property>
</bean>

The baseMapper property maps two concepts (about OWL classes) to a relation with constant type. This mapper is used by the OWLMapper, which calls it in the top-down order, that is, if there is a relation like A rdfs:subClassOf B, it invokes this relation mapper passing B, A (because it traverses the OWL hierarchy top-down). Without InvertingConceptRelMapper, this invocation would produce B is-a A, which is obviously wrong and hence we fix the problem with the inverting relation mapper above. This just invokes its base relation mapper (i.e., the baseMapper) reverting the parameters it receives.

Namespaces Helper

You might have already noticed expressions like "#{ns.iri ( 'oboInOwl:hasExactSynonym' )}". This is Spring Beans syntax. ns is defined as an instance of NamespaceUtils, which is an utility class to expand RDF/XML namespace-using URIs to their full form. A number of common namespaces are defined here and here. As you can see, this is based on the Java SPI mechanism.

You can add further namespaces in a Spring configuration (to be expanded later by meanse of the same SpEL syntax above) this way:

<bean class="org.springframework.beans.factory.config.MethodInvokingFactoryBean">
    <property name="targetObject">
        <ref local="ns"/>
    </property>
    <property name="targetMethod">
        <value>registerNs</value>
    </property>
    <property name="arguments">
        <list>
            <value>ex</value>
            <value>http://www.example.com/ex#</value>
        </list>
    </property>
</bean>

TODO: I've never tested this approach.

TO ONDEX developers: this is the purely Spring-based way. An alternative is adding the default namespaces in the OWL parser.

Default Jena model

In the default Spring configuration file you find:

<bean id = "jenaOntModel" class = "org.apache.jena.rdf.model.ModelFactory"
  scope = "prototype"
  factory-method="createOntologyModel">
  <constructor-arg value="#{T( org.apache.jena.ontology.OntModelSpec ).OWL_MEM }"></constructor-arg>
  <property name = "strictMode" value = "false" />
</bean>

This is the Jena ontology container that is initially created empty and then populated with the contents from the OWL file that you pass to the OWL parser plug-in. This a basic Jena mechanism, which allows for configuration of features about the RDF view it manages. For instance, in this default configuration, we disable all forms of automatic reasoning (by using OWL_MEM) and we avoid certain validations, such as missing imported files (via strictMode = false, usually there is ONDEX-enough information in the file being parsed, without need to lookup the imports). Obviously you can change the Jena model that is used for parsing (and its behaviour) by redefining this bean in your own configuration. However, you should be aware of possible performance issues (e.g., reasoners are nice, but often too slow).