Skip to content

eugenebang/CSG2A

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CSG2A

The official code implementation for Condition-Specific Gene-Gene Attention ($CSG^2A$) network from our paper, "Transfer Learning of Condition-Specific Perturbation in Gene Interactions Improves Drug Response Prediction" (Accepted for ISMB 2024).

Here, we provide codes for pretraining our network in transcriptome data (including LINCS L1000 dataset) and finetuning our network in cell viability data (including GDSC dataset).

Model description

The full model architecture is shown below. $CSG^2A$ network framework is composed of two steps:

Step 1. Pretraining of condition-specific response on LINCS L1000 dataset

Step 2. Fine-tuning of cell viability response on GDSC dataset

model1

Setup

First, clone this repository and move to the directory.

git clone https://github.com/eugenebang/CSG2A.git
cd CSG2A/

To install the appropriate environment for $CSG^2A$ network, you should install conda package manager.

After installing conda and placing the conda executable in PATH, the following command will create conda environment named csg2a. It will take up to 10 minutes to setup the environment, but may vary upon the Internet connection and package cache status.

conda env create -f environment.yaml && \
conda activate csg2a

Or if you have a virtual environment with adequate pytorch version for your hardware settings including GPU and CUDA, you can install the neccessary packages listed below with pip before running the model.

To check whether $CSG^2A$ network works properly, please refer to the Example codes section below.

Example codes

Sample code to fine-tune the IC50 prediction model and evaluate the performance are provided in finetune_GDSC.ipynb.

  • The file formats for each input file can be found in here.

Code for pretraining the $CSG^2A$ network on the LINCS L1000 or any other transcriptome level dataset with dose and time information are provided in pretrain_LINCS.ipynb.

Pretrained weights

We also provide the LINCS L1000-pretrained weights, utilizable for fine-tuning to cell viabilty drug reponse prediction tasks.

We note that we provide two versions; trained on LINCS L1000 Landmark genes (total 978) and trained on LINCS L1000 bing (infered) genes (total 10,167). The experimental results in the manucript is all reported using the Landmark genes.

  • Landmark pretrained model (Approx. 170MB; link)
  • Bing (infered) pretrained model (Approx. 600MB; link)

Also the pretrained MAT weights for training from scratch can be obtained from the original author's repository.

Software requirements

Operating system

$CSG^2A$ network training and evaluation were tested for Linux (Ubuntu 18.04) operating systems.

Prerequisites

$CSG^2A$ network training and evaluation were tested for the following python packages and versions.

  • python=3.10
  • pytorch=2.0.1
  • rdkit=2022.09.5
  • numpy=1.24.1
  • pandas=2.1.1
  • scipy=1.11.3
  • tqdm=4.66.1

License

The source code of $CSG^2A$ follows GPL 3.0v license, and allows users to use, modify, and distribute the software freely, even for commercial purposes.

However, any data or content produced from using $CSG^2A$ follows CC BY-NC-SA 4.0, which does not permit commercial use without proper authorization.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published