PathoGN : Pathogenicity prediction model with a graph neural network

Example Usage

Run the demo with Varibench datasets

Varibench is a benchmark datasets for prediction of genomic variant effects.

Nair PS, Vihinen M. VariBench: A Benchmark Database for Variations. Hum Mutat. 2013, 34(1):42-9.

http://structure.bmc.lu.se/VariBench/

0. Install kGCN (https://github.com/clinfo/kGCN) with conda

$ conda create -n kgcn python=3.7 conda=4.9.2
$ conda activate kgcn
$ conda install tensorflow=1.15 joblib numpy scipy scikit-learn matplotlib pandas
$ pip install --upgrade git+https://github.com/clinfo/kGCN.git

currently kGCN does NOT support TensorFlow 2

1. Get Valibench datasets

sh get_dataset.sh

2. Preprocessing data

sh build_dataset.sh

3. Get and preprocess Reactome data

sh make_reactome_data.sh
python script/preprocess_reactome.py

4. make input data for GCN

sh make_data.sh

5. Run GCN and evaluate with cross validation

sh run_gcn.sh

Prediction result

score1-5.csv will be created in result, and GCN-Score will be calculated for each of them.

The correspondence between numbers and data sets is as follows：

exovar_filtered_tool_scores
humvar_filtered_tool_scores
predictSNP_selected_tool_scores
swissvar_selected_tool_scores
varibench_selected_tool_scores

The prediction scores for ClinVar 20200210 datasets

PathoGN predicted the pathogenicity for all variants that were not annotated as either pathogenic or benign in the 2020 ClinVar dataset. The model was trained using the labeled data (pathogenic=10,877, benign=7504) and then used to make predictions for the 12,520 unlabeled variants.

The prediction result: PredictionResults_ConflictVariant_ClinVar20200210.tsv

These scores are also available on MGeND (https://mgend.med.kyoto-u.ac.jp/).

Reference

@article {PathoGN,
        author = {Kamada, Mayumi and Takagi, Atsuko and Kojima, Ryosuke and Tanaka, Yoshihisa and Nakatsui, Masahiko and Tanabe, Noriko and Hirata, Makoto and Yoshida, Teruhiko and Okuno, Yasushi},
        title = {Network-based pathogenicity prediction for variants of uncertain significance},
        year = {2021},
        doi = {10.1101/2021.07.15.452566},
        eprint = {https://www.biorxiv.org/content/early/2021/07/16/2021.07.15.452566.full.pdf},
        journal = {bioRxiv}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PathoGN : Pathogenicity prediction model with a graph neural network

Example Usage

Run the demo with Varibench datasets

0. Install kGCN (https://github.com/clinfo/kGCN) with conda

1. Get Valibench datasets

2. Preprocessing data

3. Get and preprocess Reactome data

4. make input data for GCN

5. Run GCN and evaluate with cross validation

Prediction result

The prediction scores for ClinVar 20200210 datasets

Reference

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
script		script
PredictionResults_ConflictVariant_ClinVar20200210.tsv		PredictionResults_ConflictVariant_ClinVar20200210.tsv
README.md		README.md
build_dataset.sh		build_dataset.sh
config.pathoGN.varibench.json		config.pathoGN.varibench.json
get_dataset.sh		get_dataset.sh
make_data.sh		make_data.sh
make_reactome_data.sh		make_reactome_data.sh
model_gcn.py		model_gcn.py
run_gcn.sh		run_gcn.sh

Umeshkumarku1/PathoGN

Folders and files

Latest commit

History

Repository files navigation

PathoGN : Pathogenicity prediction model with a graph neural network

Example Usage

Run the demo with Varibench datasets

0. Install kGCN (https://github.com/clinfo/kGCN) with conda

1. Get Valibench datasets

2. Preprocessing data

3. Get and preprocess Reactome data

4. make input data for GCN

5. Run GCN and evaluate with cross validation

Prediction result

The prediction scores for ClinVar 20200210 datasets

Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages