Skip to content

mapping-commons/sssom2neo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 

Repository files navigation

Convert SSSOM TSV to nodes and edges CSV files that can be ingested by neo4j-admin import.

To build:

mvn clean package

To run, assuming you have some mappings called mappings.sssom.tsv:

java -jar target/sssom2neo-1.0-SNAPSHOT.jar \
    --input mappings.sssom.tsv \
    --output-edges edges.csv \
    --output-nodes nodes.csv

You can also run over a directory containing lots of mappings files, like the OLS SSSOM dataset:

java -jar target/sssom2neo-1.0-SNAPSHOT.jar \
    --input ./mappings/ \
    --output-edges edges.csv \
    --output-nodes nodes.csv

Now you have two files, nodes.csv and edges.csv.

Let's load them into Neo4j! Assuming you already have Docker installed, we can do this quite easily. We will populate a new folder called neo with our neo4j database. First we use neo4j-admin to import the CSV:

docker run \
    -v $(pwd)/neo:/data \
    -v $(pwd)/nodes.csv:/mnt/nodes.csv \
    -v $(pwd)/edges.csv:/mnt/edges.csv \
    neo4j:4.4.20-community \
    neo4j-admin import --force --database=neo4j --array-delimiter="u+0000" --nodes=/mnt/nodes.csv --relationships=/mnt/edges.csv

If everything worked correctly, the neo folder should now contain a neo4j database populated with the SSSOM mappings from nodes.csv and edges.csv generated by the code in this repo. We can now start Neo4j:

docker run \
    -v $(pwd)/neo:/data \
    -p 7474:7474 \
    -p 7687:7687 \
    --env=NEO4J_AUTH=none \
    neo4j:4.4.20-community

Hit up http://localhost:7474 to go forth and cypher!

Examples

Get all mappings for a given subject

This query returns all mappings to/from MONDO:0005015 (diabetes mellitus). Note the syntax (a)<-[mapping]->(b) goes both ways, so both outgoing mappings (defined by MONDO) and incoming mappings (defined by other ontologies) are included in the results.

MATCH (a)<-[mapping]->(b) WHERE a.id="MONDO:0005015" RETURN *
Screenshot 2023-05-14 at 23 01 05

Get all mappings for a given subject (transitive)

We can use an arbitrary level of depth, e.g. to search for mappings up to 3 levels deep:

MATCH (a)<-[mapping*0..3]->(b) WHERE a.id="MONDO:0005015" RETURN *
Screenshot 2023-05-14 at 23 01 35

This result set includes transitive mappings e.g. MONDO:0005015-hasDbXref->UMLS:C0011849<-hasDbXref-ORDO:101952-hasDbXref->UMLS:C0011860.

Therefore UMLS:C0011860 (Type 2 diabetes mellitus) is included in the result set. Note that this is a more specific term than we started with! This is a limitation of the lacking semantics of hasDbXref, and a good example of why ontologies should use richer mapping metadata.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages