Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CURIE prefix and predicate errors #41

Open
caufieldjh opened this issue Nov 11, 2021 · 3 comments
Open

CURIE prefix and predicate errors #41

caufieldjh opened this issue Nov 11, 2021 · 3 comments
Labels
bug Something isn't working

Comments

@caufieldjh
Copy link
Collaborator

Describe the bug

During the merge, KG-IDG produces a large number of "node id [id] has no CURIE prefix" and "Invalid predicate CURIE" errors, though it isn't immediately clear which source these are from.

To Reproduce

$ python3 run.py download
$ python3 run.py transform
$ python3 run.py merge 2> merge_out.log
$ sort merge_out.log | uniq -c | sort -n
...
      3 Warning: node id http://omim.org/entry/606689 has no CURIE prefix
      3 Warning: node id http://omim.org/entry/613364 has no CURIE prefix
      3 Warning: node id http://omim.org/entry/615383 has no CURIE prefix
      3 Warning: node id http://omim.org/entry/617704 has no CURIE prefix
      5 Invalid  predicate CURIE 'owl:versionIRI'? Ignoring...
     10 Invalid  predicate CURIE 'biolink:RegulateprocessToProcess'? Ignoring...
     14 Invalid  predicate CURIE ':http://www.w3.org/2004/02/skos/core#narrowMatch'? Ignoring...
     29 Invalid  predicate CURIE ':http://www.w3.org/2004/02/skos/core#broadMatch'? Ignoring...
     48 Invalid  predicate CURIE 'rdfs:isDefinedBy'? Ignoring...
     64 Invalid  predicate CURIE 'rdfs:seeAlso'? Ignoring...
     75 Invalid  predicate CURIE 'biolink:NegativelyRegulateprocessToProcess'? Ignoring...
    160 Invalid  predicate CURIE 'owl:disjointWith'? Ignoring...
  16468 Invalid  predicate CURIE ':http://www.w3.org/2004/02/skos/core#closeMatch'? Ignoring...
  71842 Invalid  predicate CURIE ':http://www.w3.org/2004/02/skos/core#exactMatch'? Ignoring...

Most of the node id prefix errors (not shown) appear to be from OMIM, e.g.:

 2 Warning: node id http://www.omim.org/phenotypicSeries/PS619142 has no CURIE prefix

Expected behavior

This will require some forensics to identify:

  1. which invalid CURIE is from which source
  2. whether it matters
  3. if it matters, what the correct CURIE should be
  4. if there isn't a preferred CURIE, what it should look like

AND/OR

Is this an expected part of how a KGX merge operates?

Version

8a5a018

@caufieldjh
Copy link
Collaborator Author

As of the 20220203 build, here are the edge/node counts:

  total_edges: 5122941
  total_nodes: 1018616

The count of NamedThings, however, is only 985490, so there are 33,126 nodes without Biolink classes assigned, or at least have some category other than NamedThing. I suspect this is related to the OMIM CURIE warnings above, but will need to check on the merged graph's nodelist to find anything unexpected.

@caufieldjh
Copy link
Collaborator Author

Confirming:

$ grep -v NamedThing merged-kg_nodes.tsv | wc -l
33127

Looks like they're all ENSEMBL gene and protein IDs.

Three different sources use those:

~/kg-idg$ grep -rl ENSEMBL data/transformed/
data/transformed/string/string_edges.tsv_nodes.tsv
data/transformed/string/string_edges.tsv_edges.tsv
data/transformed/string/string_nodes.tsv_nodes.tsv
data/transformed/hpa/hpa-data_nodes.tsv
data/transformed/orphanet/orphanet_nodes.tsv
data/transformed/orphanet/orphanet_edges.tsv

HPA is a Koza transform and applies multiple Biolink cats appropriately.
Orphanet is transformed from orphanet.nt and also assigns ENSEMBL to both Gene/Protein and NamedThing.
So that leaves STRING - it lacks NamedThing assignments in the transformed nodelist.
Will need to modify the transform accordingly.

@caufieldjh
Copy link
Collaborator Author

STRING issue was fixed by #74

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant