miREx

The miREx GitHub repository provides the necessary code for predicting mRNA expression levels by incorporating miRNA expression data. The data used for this project is at https://zenodo.org/records/10033049.

src directory

To set up the environment for executing the scripts, you can utilize the environment.yml file by running the following command:

   conda env create -f environment.yml

To download data from the genomic data commons portal, you need to install the gdc-client. The binaries for different platforms are here: https://gdc.cancer.gov/access-data/gdc-data-transfer-tool.

The src directory includes these scripts:

gdc_download.py

This script processes the manifests and JSONs to keep samples for which both mRNA and miRNA data are present. It downloads the data with the use of the gdc-client tool.
targetscan.py

This script processes the file downloaded from targetscan, generating the ScoreMatrix.csv file. This file includes genes as rows, miRNAs as columns, and the corresponding efficacy scores of the miRNAs in repressing the respective mRNAs.
create_datasets.py

This script takes the data downloaded from gdc and creates the .h5 files with normalized mRNA and miRNA expression values. If more subtypes are present, there will be an expression value for each subtype and their mean.
miREx_train.py

This script manages the model training. The initial step involves training the baseline Xpresso model using the target mRNA data. For example, to train the model on the lung primary site (with the mean value of LUAD and LUSC subtypes), use the following command:
```
python src/miREx_train.py --datadir path/to/datadir --resultsdir path/to/results/dir --primary_site lung --cancer_subtype mean
```
The --primary_site parameter can be adjusted to train the model on tumors on different primary sites. If different subtypes of that tumour exists, the --cancer_subtype parameter can be used to train on a particular subtype, otherwise the model will be trained using the mean of all subtypes. To train the model using all miRNAs, include the --mir argument:
```
python src/miREx_train.py --datadir path/to/datadir --resultsdir path/to/results/dir --primary_site lung --cancer_subtype mean --mir
```
create_lists.py

This script identifies the top 10 miRNAs with the highest correlation between residuals and targetscan CWCScore. The script miREx_train.py model can then be used with the --mir_type argument to choose which list of miRNAs to use for the training.
```
   python src/miREx_train.py --datadir path/to/datadir --resultsdir path/to/results/dir --primary_site lung --cancer_subtype mean --mir --mir_type corr
```

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
src		src
README.md		README.md
ScoreMatrix.csv		ScoreMatrix.csv
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

miREx

src directory

About

Releases

Packages

Languages

ElenaPianfetti/miREx

Folders and files

Latest commit

History

Repository files navigation

miREx

src directory

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages