PathoGN : Pathogenicity prediction model with a graph neural network

Example Usage

Run the demo with Varibench datasets

Varibench is a benchmark datasets for prediction of genomic variant effects.

Nair PS, Vihinen M. VariBench: A Benchmark Database for Variations. Hum Mutat. 2013, 34(1):42-9.

http://structure.bmc.lu.se/VariBench/

0. Install kGCN (https://github.com/clinfo/kGCN) with conda

$ conda create -n kgcn python=3.7 conda=4.9.2
$ conda activate kgcn
$ conda install tensorflow=1.15 joblib numpy scipy scikit-learn matplotlib pandas
$ pip install --upgrade git+https://github.com/clinfo/kGCN.git

currently kGCN does NOT support TensorFlow 2

1. Get Valibench datasets

sh get_dataset.sh

2. Preprocessing data

sh build_dataset.sh

3. Get and preprocess Reactome data

sh make_reactome_data.sh
python script/preprocess_reactome.py

4. make input data for GCN

sh make_data.sh

5. Run GCN and evaluate with cross validation

sh run_gcn.sh

Prediction result

score1-5.csv will be created in result, and GCN-Score will be calculated for each of them.

The correspondence between numbers and data sets is as follows：

exovar_filtered_tool_scores
humvar_filtered_tool_scores
predictSNP_selected_tool_scores
swissvar_selected_tool_scores
varibench_selected_tool_scores

The prediction scores for ClinVar 20200210 datasets

PathoGN predicted the pathogenicity for all variants that were not annotated as either pathogenic or benign in the 2020 ClinVar dataset. The model was trained using the labeled data (pathogenic=10,877, benign=7504) and then used to make predictions for the 12,520 unlabeled variants.

The prediction result: PredictionResults_ConflictVariant_ClinVar20200210.tsv

These scores are also available on MGeND (https://mgend.med.kyoto-u.ac.jp/).

Reference

@article {PathoGN,
        author = {Kamada, Mayumi and Takagi, Atsuko and Kojima, Ryosuke and Tanaka, Yoshihisa and Nakatsui, Masahiko and Tanabe, Noriko and Hirata, Makoto and Yoshida, Teruhiko and Okuno, Yasushi},
        title = {Network-based pathogenicity prediction for variants of uncertain significance},
        year = {2021},
        doi = {10.1101/2021.07.15.452566},
        eprint = {https://www.biorxiv.org/content/early/2021/07/16/2021.07.15.452566.full.pdf},
        journal = {bioRxiv}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

PathoGN : Pathogenicity prediction model with a graph neural network

Example Usage

Run the demo with Varibench datasets

0. Install kGCN (https://github.com/clinfo/kGCN) with conda

1. Get Valibench datasets

2. Preprocessing data

3. Get and preprocess Reactome data

4. make input data for GCN

5. Run GCN and evaluate with cross validation

Prediction result

The prediction scores for ClinVar 20200210 datasets

Reference

Files

README.md

Latest commit

History

README.md

File metadata and controls

PathoGN : Pathogenicity prediction model with a graph neural network

Example Usage

Run the demo with Varibench datasets

0. Install kGCN (https://github.com/clinfo/kGCN) with conda

1. Get Valibench datasets

2. Preprocessing data

3. Get and preprocess Reactome data

4. make input data for GCN

5. Run GCN and evaluate with cross validation

Prediction result

The prediction scores for ClinVar 20200210 datasets

Reference