CNN_seq

CNN model based on RBD-ACE2 sequence, predicting binding affinity change upon mutations.

Configure environment

The code currently only supports CUDA-supported machine.

Pre-trained models are based on these packages, and errors/warnings may occur when using different versions.

Install Anaconda for Linux on the node following the instructions.

conda env remove -n cnn_seq # Remove existing conda env
conda create -n cnn_seq python=3.8.5 # Create new conda env
conda activate cnn_seq # Activate the cnn_seq env
conda install numpy matplotlib scikit-learn # Basic tools
conda install pytorch=1.8.1 torchvision torchaudio cudatoolkit=10.2 -c pytorch # PyTorch with CUDA support
conda install biopython -c conda-forge # Biopython

Files

[./]

cnn_kfold.py is the main code for training, prediction and analysls.

var_blind_test_set.csv contains 1677 variants, constituting a completely blind test set.

var_chk.csv contains all variants to be predicted.

var_chk_circ.csv contains many variants (including circulating variants), which can serve as an example of how to prepare the list.

[./Mater]

[./Mater/expt_multi_mutant] Contains experimental data for training purposes.

[./Mater/python] Contains python modules called by cnn_kfold.py.

[./Mater/species] Contains basic information used by cnn_kfold.py for each hosts.

[./Models] Place where trained models are stored. Note that freshly-trained models will overwrite existing ones.

[./Results] Temporary place where predicted results are processed.

Usage

Prepare variants list for prediction.

Modify or create the file var_chk.csv.

See var_chk_circ.csv for examples.

Note that for non-human hosts, the residue indices are 318 smaller than the expected one. For instance, deer_166K corresponds to E484K for deer, while humancov2_484K or simply 484K corresponds to E484K for human.

Predict K_D,app ratios for variants in `var_chk.csv` with 25 pre-trained models.

./cnn_kfold.py 5

5-fold cross-validation test, generating 5 models.

./cnn_kfold.py 3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CNN_seq

Configure environment

Files

Usage

Prepare variants list for prediction.

Predict K_D,app ratios for variants in `var_chk.csv` with 25 pre-trained models.

5-fold cross-validation test, generating 5 models.

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
Mater		Mater
Models		Models
Results		Results
.gitattributes		.gitattributes
README.md		README.md
cnn_kfold.py		cnn_kfold.py
cplt.cfg		cplt.cfg
var_blind_test_set.csv		var_blind_test_set.csv
var_chk.csv		var_chk.csv
var_chk_circ.csv		var_chk_circ.csv

maranasgroup/CNN_seq_CoV2

Folders and files

Latest commit

History

Repository files navigation

CNN_seq

Configure environment

Files

Usage

Prepare variants list for prediction.

Predict KD,app ratios for variants in var_chk.csv with 25 pre-trained models.

5-fold cross-validation test, generating 5 models.

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Predict K_D,app ratios for variants in `var_chk.csv` with 25 pre-trained models.

Packages