GitHub - pinglab-intern/Cardio-MeSH2Pgraph: Ping Lab Intern Project, Summar, 2022: Mapping MeSH (ICD codes) to molecular mechanism through protein-protein co-occurance graph -toward high precission medicine

Ping Lab Intern Project, Summar, 2022

Title:

Mapping MeSH (ICD codes) to the molecular mechanism through the protein-protein co-occurrence graph -toward high precision medicine

Detail:

One of the central ideas in personalized medicine is to understand the disease and symptoms in terms of molecular phenotypes. Bridging molecular phenotypes to disease signs and symptoms is a challenging task. The biggest question is building the personal interactome through the interface of personal omics and the history of medical records (e.g., EHR, EMR). In this project, we will attempt to create a mapping of Medical Subject Heading (MeSH/ICD) to a relevant protein-protein co-occurrence graph. Since MeSH terms are organized in the hierarchical tree structure, we are only considering the leaf nodes of the cardiovascular tree. One can extend the protein-protein co-occurrence graph to relevant genes, metabolites, and molecular pathways.

Figure: Sigdel et al., Understanding the Molecular Interface of Cardiovascular Diseases and COVID-19:A Data Science Approach, Advanced Technologies in Cardiovascular Bioengineering, 2022

Data Sources

Project Walkthrough

Prepare Cardiovascular Documents from PubMed
Get MeSH terms (MeSH tree 20222) from the NLM library
Build MeSH to PMID and PMID to MeSH mapping through the caseOLAP platform
Build document to entity mapping (PMID to protein mapping) through the caseOLAP platform
Create MeSH to entities mapping for leaf nodes of the CVD meSH tree.
Create a protein-protein co-occurrence graph by utilizing document to entity mapping
Integrate protein-protein co-occurrence graph with MeSH, Pathways, Drugs, and Metabolites information.
Load graph data in the Neo4J platform for further exploration

Educational Goal:

This project offers the opportunity of learning about disease terminologies (MeSH/ICD codes) as the tree data structure. These data structures play a key role in organizing documents in databases (e.g., MeSH in PubMed, ICD codes in EHR database). Implementing text-mining and knowledge graphs for targeted diseases/symptoms to molecular mechanisms is a central construct of creating an interface of personal EHR data and scientific research. The take-home skills in this project are text-mining, data engineering with graph data, and AI algorithms.

Scientific Goal:

Implementation of text-mining for building mapping of MeSH (ICD codes) to the molecular mechanism through protein-protein co-occurrence graph provides new insight toward high precision medicine. In the case of medical informatics, this approach will help build automated cohort selection and patient classification. It will help create a personalized interface of clinical and biomedical information.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
plots		plots
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ping Lab Intern Project, Summar, 2022

Title:

Detail:

Data Sources

Project Walkthrough

Educational Goal:

Scientific Goal:

References:

About

Releases

Packages

License

pinglab-intern/Cardio-MeSH2Pgraph

Folders and files

Latest commit

History

Repository files navigation

Ping Lab Intern Project, Summar, 2022

Title:

Detail:

Data Sources

Project Walkthrough

Educational Goal:

Scientific Goal:

References:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages