-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Loading BioKG in Neo4j #4
Comments
Hi Dimitris, Thanks a lot for the great description and details mentioned in this issue, It has been a good year since I have last touched on this project and I have moved a few jobs now. However, I want to provide you with some support in relation to the issues you have. I am going to give you a very lazy answer now and probably in a few days I can look at this more carefully and give you a better answer. In relation to issue 1, have you tried to use the ready-produced KG located in the releases section? It should be the same as in the paper. I could not get issue 2 properly, so I will try to look at it again later and try to give you an answer. In relation to the typo, Thanks for noticing that. It is a small thing I know, but would you be kind and change it and make a pull request? I will accept it immediately. Thanks a lot. |
Hey Sameh, Thank you for taking the time and for your answer! I think it makes a lot of sense given the frequency of updates in a lot of the integrated data sources (e.g. DrugBank as you mentioned) I used the instruction on the readme of the repository to compile Regarding issue 2, it's related to the semantics of pathways, complexes in relation to proteins, cause they do affect the way I convert the data for Neo4j. Thanks for taking some more time to look into it. Best, |
Hey folks,
First of all, I'd like to thank you for this contribution. Having a unified biomedical KG is an essential resource for research in this domain.
I would like to use BioKG in my work. Specifically, we would like to train a link predictor to perform the task of drug-target interaction prediction and utilise the benchmarks you so thoughtfully include, in order to compare the performance of our DTI approach vs others.
For this, I thought it would be useful to have the BioKG final data (in
/data/biokg/
) uploaded to a Neo4j property graphstore, to enable querying for specific benchmarks (using hyper-relations for example: a relation DTI with qualifier (benchmark: 'FDA') or DDI with qualifier (benchmark: MINERAL)). Furthermore, having BioKG as a Neo4j ready graph could increase usability and visibility, so I plan on making it public once I manage to get it done.The 2 issues I'm facing:
The number of unique entities/relations that I see after loading the
.tsv
data in Pandas is different than the ones reported in the paper, so I've been looking into what could've gone wrong.The way I create the Neo4j graph is as follows:
Following the logic above everything runs smoothly up to the point where I try to load the links that include COMPLEXES + PATHWAYs for which I cannot find any matches for.
If I understand the data model correctly,
complex_ids
exist only as part of the LINKS file and do not appear in the properties + metadata files (?).Which identifiers are the ones that I should use to create the unique
Complex
nodes?Apologies for the lengthy post and for potential inaccuracies on my end.
Minor comment:
A typo I found while reading your documentation:
biokg/links_description.txt
Line 93 in 92a71e7
The relation should be PROTEIN_DISEASE if I'm not mistaken.
Thank you again for your great contribution! I would greatly appreciate any help :-)
Cheers!
The text was updated successfully, but these errors were encountered: