Skip to content

jiaqingxie/ProB

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

logo

ProB: Learning Based Protein Benchmark

Applications in Protein Property Prediction, Localization Prediction and Protein-Protein Interaction etc.


PyPI version Downloads Downloads GitHub Repo stars GitHub Repo forks

Installation

First we recommend you to follow the instruction of how DeepPurpose's dependencies are installed.

conda create -n prob python=3.9
conda activate prob
pip install git+https://github.com/bp-kelley/descriptastorus
pip install lmdb
pip install seaborn
pip install DeepPurpose

A version of torch 2.1+ is required to installed, since dgl required a version of torch >=2.1.0.

  1. If you want to use GPU, then firstly find a matched torch version, then install dgl with cuda version. We give an example of torch 2.3.0 with cuda 11.8:
    pip install torch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 --index-url https://download.pytorch.org/whl/cu118
    conda install -c dglteam/label/th23_cu118 dgl
  2. If you are not using a GPU, then follow this:
    pip install torch==2.3.0 torchvision==0.18.0 torchaudio==2.3.0 --index-url https://download.pytorch.org/whl/cpu
    conda install -c dglteam/label/th23_cpu dgl

Encodings

Thanks to DeepPurpose and dgllife, we could borrow some of the encodings from DeepPurpose. The new added encodings are PAGTN, EGT and Graphormer which belong to graph transformer modules that are prevailing methods these years for encoding protein graphs.

Currently, we support the following encodings:

Drug Encodings Description
CNN Convolutional Neural Network on SMILES
CNN_RNN A GRU/LSTM on top of a CNN on SMILES
Transformer Transformer Encoder on ESPF
MPNN Message-passing neural network
DGL_GCN Graph Convolutional Network
DGL_NeuralFP Neural Fingerprint
DGL_AttentiveFP Attentive FP, Xiong et al. 2020
DGL_GAT Graph Attention Network
PAGTN Path Augmented Graph Transformer Network
Graphormer Do Transformers Really Perform Bad, Ying et al.

Note that we've tried EGT, however, it would lead to memory error if we want to construct a large batched edge feature matrix therefore we ignore the implementation of EGT. This could be solved if applied to small graphs so it will be our future work.

Protein Function (Property) Prediction

Done:

  1. Dataset Built for: Fluorescence Dataset

test on the CNN probing on the Fluorescence dataset

python CNN_Fluroscence.py

test on the GCN probing on the Fluorescence dataset

python GCN_Fluroscence.py
python test.py

Protein Protein Interaction (PPI)

python train/ppi_affinity.py --target_encoding CNN --seed 7 --wandb_proj DeepPurposePP --lr 0.0001 --epochs 60

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published