CSG2A

The official code implementation for Condition-Specific Gene-Gene Attention ($CSG^2A$) network from our paper, "Transfer Learning of Condition-Specific Perturbation in Gene Interactions Improves Drug Response Prediction" (Accepted for ISMB 2024).

Here, we provide codes for pretraining our network in transcriptome data (including LINCS L1000 dataset) and finetuning our network in cell viability data (including GDSC dataset).

Model description

The full model architecture is shown below. $CSG^2A$ network framework is composed of two steps:

Step 1. Pretraining of condition-specific response on LINCS L1000 dataset

Step 2. Fine-tuning of cell viability response on GDSC dataset

Setup

First, clone this repository and move to the directory.

git clone https://github.com/eugenebang/CSG2A.git
cd CSG2A/

To install the appropriate environment for $CSG^2A$ network, you should install conda package manager.

After installing conda and placing the conda executable in PATH, the following command will create conda environment named csg2a. It will take up to 10 minutes to setup the environment, but may vary upon the Internet connection and package cache status.

conda env create -f environment.yaml && \
conda activate csg2a

Or if you have a virtual environment with adequate pytorch version for your hardware settings including GPU and CUDA, you can install the neccessary packages listed below with pip before running the model.

To check whether $CSG^2A$ network works properly, please refer to the Example codes section below.

Example codes

Sample code to fine-tune the IC50 prediction model and evaluate the performance are provided in finetune_GDSC.ipynb.

The file formats for each input file can be found in here.

Code for pretraining the $CSG^2A$ network on the LINCS L1000 or any other transcriptome level dataset with dose and time information are provided in pretrain_LINCS.ipynb.

Pretrained weights

We also provide the LINCS L1000-pretrained weights, utilizable for fine-tuning to cell viabilty drug reponse prediction tasks.

We note that we provide two versions; trained on LINCS L1000 Landmark genes (total 978) and trained on LINCS L1000 bing (infered) genes (total 10,167). The experimental results in the manucript is all reported using the Landmark genes.

Landmark pretrained model (Approx. 170MB; link)
Bing (infered) pretrained model (Approx. 600MB; link)

Also the pretrained MAT weights for training from scratch can be obtained from the original author's repository.

Software requirements

Operating system

$CSG^2A$ network training and evaluation were tested for Linux (Ubuntu 18.04) operating systems.

Prerequisites

$CSG^2A$ network training and evaluation were tested for the following python packages and versions.

python=3.10
pytorch=2.0.1
rdkit=2022.09.5
numpy=1.24.1
pandas=2.1.1
scipy=1.11.3
tqdm=4.66.1

License

The source code of $CSG^2A$ follows GPL 3.0v license, and allows users to use, modify, and distribute the software freely, even for commercial purposes.

However, any data or content produced from using $CSG^2A$ follows CC BY-NC-SA 4.0, which does not permit commercial use without proper authorization.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data		data
img		img
models		models
utils		utils
CC-BY-NC-SA-4.0		CC-BY-NC-SA-4.0
LICENSE		LICENSE
README.md		README.md
environment.yaml		environment.yaml
finetune_GDSC.ipynb		finetune_GDSC.ipynb
pretrain_LINCS.ipynb		pretrain_LINCS.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CSG2A

Model description

Setup

Example codes

Pretrained weights

Software requirements

License

About

Uh oh!

Releases

Packages

Languages

License

eugenebang/CSG2A

Folders and files

Latest commit

History

Repository files navigation

CSG2A

Model description

Setup

Example codes

Pretrained weights

Software requirements

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages