Mapping MeSH (ICD codes) to the molecular mechanism through the protein-protein co-occurrence graph -toward high precision medicine
One of the central ideas in personalized medicine is to understand the disease and symptoms in terms of molecular phenotypes. Bridging molecular phenotypes to disease signs and symptoms is a challenging task. The biggest question is building the personal interactome through the interface of personal omics and the history of medical records (e.g., EHR, EMR). In this project, we will attempt to create a mapping of Medical Subject Heading (MeSH/ICD) to a relevant protein-protein co-occurrence graph. Since MeSH terms are organized in the hierarchical tree structure, we are only considering the leaf nodes of the cardiovascular tree. One can extend the protein-protein co-occurrence graph to relevant genes, metabolites, and molecular pathways.
Figure: Sigdel et al., Understanding the Molecular Interface of Cardiovascular Diseases and COVID-19:A Data Science Approach, Advanced Technologies in Cardiovascular Bioengineering, 2022
- Prepare Cardiovascular Documents from PubMed
- Get MeSH terms (MeSH tree 20222) from the NLM library
- Build MeSH to PMID and PMID to MeSH mapping through the caseOLAP platform
- Build document to entity mapping (PMID to protein mapping) through the caseOLAP platform
- Create MeSH to entities mapping for leaf nodes of the CVD meSH tree.
- Create a protein-protein co-occurrence graph by utilizing document to entity mapping
- Integrate protein-protein co-occurrence graph with MeSH, Pathways, Drugs, and Metabolites information.
- Load graph data in the Neo4J platform for further exploration
This project offers the opportunity of learning about disease terminologies (MeSH/ICD codes) as the tree data structure. These data structures play a key role in organizing documents in databases (e.g., MeSH in PubMed, ICD codes in EHR database). Implementing text-mining and knowledge graphs for targeted diseases/symptoms to molecular mechanisms is a central construct of creating an interface of personal EHR data and scientific research. The take-home skills in this project are text-mining, data engineering with graph data, and AI algorithms.
Implementation of text-mining for building mapping of MeSH (ICD codes) to the molecular mechanism through protein-protein co-occurrence graph provides new insight toward high precision medicine. In the case of medical informatics, this approach will help build automated cohort selection and patient classification. It will help create a personalized interface of clinical and biomedical information.