Predator is a computational tool that offers both the prediction of mutation effects on protein-protein interactions by classifying them into disrupting and nondisrupting and provides a comprehensive analysis on candidate cancer associated genes, their most frequently disrupted interaction partners, cancer patients and several cancer cohorts in TCGA project.
For more information, please refer to the article which can be found at here.
Below are the steps to obtain the results in the paper.
-
Download the repository and move to the reproducible folder.
cd \Predator\src\reproducible
-
Update the conda base
conda update conda -n base -y
-
conda-forge needs to be added for installations of packages.
conda config --append channels conda-forge
-
Create a new environment named predator with a specified Python version and install required packages.
conda create python=3.8.13 --name predator --file requirements.txt -y
-
Activate new environment.
conda activate predator
-
Adding ipykernel to this new environment named Predator.
python -m ipykernel install --user --name predator --display-name "Predator"
The trained model can be found in here. In order to train from scratch, please execute the following command:
python reproducable_01_training_predator.py
Newly trained model will be extracted in src\PredatorModels\PredatorModel_<date>\<hash> directory. Additionally, a new executed Jupyter notebook Reproduced_PredatorStudyModel.ipynb will be created in reproducible folder.
Trained Predator model can be applied to TCGA mutation datasets with reproducable_01_training_predator.py. The script also allows the selection of the model to be used in the prediction task. Simply run the following command to execute:
python reproducable_02_predicting_tcga.py
This will export prediction files in predictions_datasets folder and create Reproduced_PredatorStudy_<TCGA>.ipynb for each TCGA cohort.
If the path of prediction files are not to be updated, the patient interaction analysis files are generated by the following command:
python reproducable_03_patient_interaction_analysis.py
The paths of newer prediction datasets can be updated in the script before running it. Upon completion, Excel files containing interactions and patients for each TCGA should appear in data/patient_interaction_datasets folder. Also, Reproduced_Disruptive_patients_per_patient.ipynb will be created in executed form.
Lastly, update the path if necessary in reproducible_04_analysis.py and run it with the command below:
python reproducible_04_analysis.py
Execution of this command will create counts file for each TCGA.
Run the first part of the notebook tables/preliminary_tables_counts.ipynb as indicated.
Run the notebook analyses/PatientInteractionAnalysis/PatientInteractionAnalysis.ipynb.
Continue with the second part of the tables/preliminary_tables_counts.ipynb, which the Gene Level Statistics table will be generated.
If you find Predator useful for your research, please consider citing the following paper:
Berber, Ibrahim, Cesim Erten, and Hilal Kazan. "Predator: Predicting the impact of cancer somatic mutations on protein-protein interactions." IEEE/ACM Transactions on Computational Biology and Bioinformatics (2023).
@article{berber2023predator,
title={Predator: Predicting the impact of cancer somatic mutations on protein-protein interactions},
author={Berber, Ibrahim and Erten, Cesim and Kazan, Hilal},
journal={IEEE/ACM Transactions on Computational Biology and Bioinformatics},
year={2023},
volume={20},
number={5},
pages={3163-3172},
publisher={IEEE},
doi={10.1109/TCBB.2023.3262119},
}