Skip to content

Mining multi-omics relationship networks from literature and databases

License

Notifications You must be signed in to change notification settings

PNNL-Predictive-Phenomics/DancePartner

Repository files navigation

Welcome to DancePartner!

DancePartner is a python package which creates multi-omics networks derived from literature and databases. In these networks, each node is a biomolecule and each edge indicates a relationship between biomolecules, which can be a metabolic relationship, a binding event, or any other interaction between two biomolecules. DancePartner takes literature and biological databases (KEGG, Wikipathways, etc.) as inputs and outputs biomolecule relationships (the "dance partners" of other biomolecules). These networks can be visualized and used as inputs into other biological tools and technologies.

This tool is designed with five steps with various options at each step. Users can select one or multiple options, and outputs can be combined. The steps are 1. Select Papers, 2. Identify Entities, 3. Extract Relationships, 4. Collapse Synonyms, and 5. Build Network. DancePartner has been designed to easily pull and use AI/ML models so that non-ML experts can easily pull and use these functions and methods. More details about each step are provided in the summary graphic above.

See full Sphinx documentation here

Cite

Degnan D.J., Strauch C.W., Obiri M.Y., VonKaenel E.D., Adrian D.W., and Bramer L.M. “DancePartner: Python Package to Mine Multiomics Relationship Networks from Literature and Databases.” Journal of Proteome Research, 2025. https://doi.org/10.1021/acs.jproteome.5c00520

How to Use

Installation

To install this package, clone the repo and navigate to the directory it sits in. Then, use pip to install.

>>> git clone https://github.com/pnnl-predictive-phenomics/DancePartner.git
>>> cd {Directory it was installed to}
>>> virtualenv --python="<<path to python version 3.9>>" DancePartner 
>>> source <<path/to/activate/virtualenv>>
>>> pip install DancePartner/

Examples

This package contains a vignettes/ folder with various example vignettes that walk through using this package. We suggest starting there.

Notes

Pull the BERT Model

Extract the BERT model from here. Place in the top level directory of this repo in a folder called "biobert". Pull the config.json, the pytorch_model.bin, and the training_args.bin files.

Scopus API Key

An API key is needed to pull papers from Scopus. Instructions to obtain one can be found here.

Optional ScispaCy Model

Download the en_ner_bionlp13cg_md model from here. SpispaCy can be difficult to set up, and has not been thoroughly tested on every common operating system. Here is a suggested installation set-up below. Note that this model is not required to run DancePartner

>>> source <<path/to/activate/virtualenv>>
>>> pip install spacy==3.7.5
>>> CFLAGS="-mavx -DWARN(a)=(a)" pip install nmslib # You may need a specific install step here for nmslib, see the scispacy repo. This setup worked for us. 
>>> pip install scispacy==0.5.5
>>> cd <<wherever/en_ner_bionlp13cg_md downloaded>>
>>> pip install .

Relationships

DancePartner finds relationships but does not characterize them (e.g. a metabolic relationship, an interaction event, etc.).

Test Code

To run tests, make sure to create the scopus_key.txt file and ncbi_key.txt file. Install the coverage package and then follow the instructions on each test file. If coverage doesn't run properly after install, deactivate the DancePartner virtual environment and reactivate it.

>>> pip install pytest
>>> pip install coverage

Additional Stop Words

Add or modify the stop words list here

Note on the "omes" folder

Several functions require support files found in the omes folder. It is required to run DancePartner.

How to Extract Paper IDs

PubMed: PMIDs. Enter a query into the search bar of PubMed, click “Save”, select “All results”, and output the format as “PMID”. Link to database

Scopus: DOIs. Enter a query into the search bar, click “Export”, select the desired format, select all documents, and then export at least the DOI column. Link to database

OSTI: OSTI IDs. Enter a query and click “Save Results”, and the resulting file will contain the OSTI IDs. Link to database

Complex Query Example

Here are some examples of more complicated queries if needed:

PubMed: (e coli proteomics) AND (e coli metabolism) AND (("2000/01/01"[Date - Publication]: "2024/02/23"[Date - Publication]))

Scopus: TITLE-ABS-KEY ( e AND coli AND proteomics AND e AND coli AND metabolism ) AND PUBYEAR > 20000101 AND PUBYEAR < 20240224)

OSTI: q=e coli proteomics AND e coli metabolism&publication_date_start=1/1/2000&publication_date_end=2/3/2024

Synonym Tables

Because a specific gene or protein can have multiple synonyms and database identifiers, here we collapse synonyms and identifiers into groups of cross-referenced synonyms and identifiers. These new groups become DancePartnerIDs.

About

Mining multi-omics relationship networks from literature and databases

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors