Varibench is a benchmark datasets for prediction of genomic variant effects.
Nair PS, Vihinen M. VariBench: A Benchmark Database for Variations. Hum Mutat. 2013, 34(1):42-9.
http://structure.bmc.lu.se/VariBench/
0. Install kGCN (https://github.com/clinfo/kGCN) with conda
$ conda create -n kgcn python=3.7 conda=4.9.2
$ conda activate kgcn
$ conda install tensorflow=1.15 joblib numpy scipy scikit-learn matplotlib pandas
$ pip install --upgrade git+https://github.com/clinfo/kGCN.git
- currently kGCN does NOT support TensorFlow 2
sh get_dataset.sh
sh build_dataset.sh
sh make_reactome_data.sh
python script/preprocess_reactome.py
sh make_data.sh
sh run_gcn.sh
score1-5.csv
will be created in result
, and GCN-Score will be calculated for each of them.
The correspondence between numbers and data sets is as follows:
- exovar_filtered_tool_scores
- humvar_filtered_tool_scores
- predictSNP_selected_tool_scores
- swissvar_selected_tool_scores
- varibench_selected_tool_scores
PathoGN predicted the pathogenicity for all variants that were not annotated as either pathogenic or benign in the 2020 ClinVar dataset. The model was trained using the labeled data (pathogenic=10,877, benign=7504) and then used to make predictions for the 12,520 unlabeled variants.
The prediction result: PredictionResults_ConflictVariant_ClinVar20200210.tsv
These scores are also available on MGeND (https://mgend.med.kyoto-u.ac.jp/).
@article {PathoGN,
author = {Kamada, Mayumi and Takagi, Atsuko and Kojima, Ryosuke and Tanaka, Yoshihisa and Nakatsui, Masahiko and Tanabe, Noriko and Hirata, Makoto and Yoshida, Teruhiko and Okuno, Yasushi},
title = {Network-based pathogenicity prediction for variants of uncertain significance},
year = {2021},
doi = {10.1101/2021.07.15.452566},
eprint = {https://www.biorxiv.org/content/early/2021/07/16/2021.07.15.452566.full.pdf},
journal = {bioRxiv}
}