-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use only mondo.sssom.tsv for disease normalisation? #269
Comments
Thanks @matentzn, this is great. This is a bit outside of the scope of ORION, because normalization decisions, and the list of equivalent identifiers, come directly from the Node Normalizer service, backed by Babel. You have tagged the right person, but he's currently on vacation for a few weeks. In the meantime, I can say:
For this specific case, it does look like maybe those two identifiers should be considered synonyms but they're not currently. (https://nodenormalization-sri.renci.org/1.5/get_normalized_nodes?curie=MONDO%3A0008234&curie=DOID%3A0050430)
|
Yikes.. That is very unfortunate! But good to know, and I guess in principle makes some sense (the order though does not iMO, DOID, OMIM, OMIM.PS, orphanet, EFO are entirely subsumed under Mondo, and UMLS should be last in that list). It just means that in the context of the Everycure, I need to push for a different prefix preference to be used then the standard biolink one.
Cool, thanks for checking! Shall I move this issue to NN repo then?
https://biolink.github.io/biolink-model/exact_matches/?
Great! I would be happy to advice on the formatting for that, e.g. https://mapping-commons.github.io/sssom/ |
First of all, thanks for having me :) @marcello-deluca invited me to provide some feedback here, and I am glad to see a team here that is as passionate about biomedical KGs as us join forces! Lets get into it. (My tone when providing feedback is sometimes a bit german, sorry about that; I only do this because your overall product us awesome, else I wouldn't bother).
At the moment, we are missing some interesting integration in ROBOKOP.
Lets look at
As you can see, only one of the two diseases, which are clearly the same, are associated with the
Cutaneous lichen amyloidosis
phenotype:In Mondo SSSOM, these are mapped:
This can have some unnecessary consequences for downstream prediction tasks, especially if links to the non-Mondo ID do not make it into the final KG subset used for learning.
The whole purpose of Mondo is to provide a broad scope disease vocabulary in which we can project existing disease vocabularies, fully at least: DO, ORDO, OMIM, NCIT neoplasm and UMLS (among others).
I would like to suggest two things:
cc @gaurav
The text was updated successfully, but these errors were encountered: