GitHub - alvinchangw/DCEN_CHIL2021: Implementation for RNA Alternative Splicing Prediction with Discrete Compositional Energy Network (ACM CHIL 2021)

This is our Pytorch implementation of RNA Alternative Splicing Prediction with Discrete Compositional Energy Network.

Installation

pip install -r requirements.txt

Data: Context Augmented Psi Dataset (CAPD)

Download the CAPD here, unzip and place the files in dataset/.

Saved weights

Saved weights of all the models in the paper are available in: saved_models.

Training codes

Train spliceai on the splice site classification and psi regression objectives:

sh train_scripts/train_spliceai_clsreg.sh

This training script saves two weights files site_aux_pytorch_model.bin and spliceai_pytorch_model.bin at each checkpoint folder.

Train splicesome net on the psi regression objective with pretrained site_aux_pytorch_model.bin and spliceai_pytorch_model.bin weights from spliceai:

sh train_scripts/train_dcen.sh

Evaluation codes

Evaluate and output pred and label npy files for each patient's gene data, output file is formatted as <tissue_type><patient_id>_chr<chromosome_number>:

python eval_scripts/schedule_eval_dcen.py

The evaluation result is computed from the schedule_eval_* script and is stored in each of these folders only cover one particular tissue_type, patient_id and chromosome_number. Other schedule_eval_* scripts are available for other ablation variants. These scripts run python train_and_infer.py iteratively over all the relevant <tissue_type><patient_id>_chr<chromosome_number>.

`python train_and_infer.py` key arguments for evaluation:

--spliceosome_model_training_type : determine the ablation variants of the model
--eval_patient_list : patient id(s) to evaluate
--eval_chr_list : chromosome number(s) to evaluate
--eval_output_dir_prefix : output directory of evaluation results, pred and label npy files

To compute evaluation across multiple tissue types, patients or genes use the aggregate_eval* scripts such as the ones below.

Compute evaluation across multiple patient and gene pred and label npy files:

python aggregate_eval_all_models_alltest.py

Compute evalation across all patient and only gene from chromosome numbers excluded from training set:

python aggregate_eval_all_models_exchromosomes.py

Compute evalation across all patient and only gene longer than limit in training set:

python aggregate_eval_all_models_longgenes.py

Prediction codes

Infer and output the splice sites' and transcripts' predictions as a jsonl file, for only C1orf21 TTLL7 genes.

sh predict_dcen.sh

Remove --eval_gene_list arg to infer all genes.

`python train_and_infer.py` key arguments for prediction:

--valid_data_dir : directory containing samples to infer probabilities
--eval_gene_list : list of genes to infer, spaced apart (e.g. "C1orf21 TTLL7" to infer only these two genes). Default: infer all genes in samples --eval_output_dir_prefix : output directory of prediction results, pred and label npy files
--pred_output_filename : output prediction json file name
--eval_patient_list : patient id(s) to evaluate
--eval_chr_list : chromosome number(s) to evaluate

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

This is our Pytorch implementation of RNA Alternative Splicing Prediction with Discrete Compositional Energy Network.

Installation

Data: Context Augmented Psi Dataset (CAPD)

Saved weights

Training codes

Evaluation codes

`python train_and_infer.py` key arguments for evaluation:

Prediction codes

`python train_and_infer.py` key arguments for prediction:

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
dataset		dataset
eval_scripts		eval_scripts
modeling		modeling
models		models
saved_models		saved_models
train_scripts		train_scripts
README.md		README.md
aggregate_eval_all_models_alltest.py		aggregate_eval_all_models_alltest.py
aggregate_eval_all_models_exchromosomes.py		aggregate_eval_all_models_exchromosomes.py
aggregate_eval_all_models_longgenes.py		aggregate_eval_all_models_longgenes.py
predict_dcen.sh		predict_dcen.sh
requirements.txt		requirements.txt
train_and_infer.py		train_and_infer.py

alvinchangw/DCEN_CHIL2021

Folders and files

Latest commit

History

Repository files navigation

This is our Pytorch implementation of RNA Alternative Splicing Prediction with Discrete Compositional Energy Network.

Installation

Data: Context Augmented Psi Dataset (CAPD)

Saved weights

Training codes

Evaluation codes

python train_and_infer.py key arguments for evaluation:

Prediction codes

python train_and_infer.py key arguments for prediction:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

`python train_and_infer.py` key arguments for evaluation:

`python train_and_infer.py` key arguments for prediction:

Packages