diff --git a/mirdip-tissue/sources/README.md b/mirdip-tissue/sources/README.md index 21068c5..75b235f 100644 --- a/mirdip-tissue/sources/README.md +++ b/mirdip-tissue/sources/README.md @@ -1,20 +1,19 @@ - +======= # Post-processing of Contexts for RNAseq data for genes and miRNAs Post-processing steps for mRNA as well as miRNA are available in the corresponding folders, `postprocessingMRNA` and `postprocessingMiRNA`. Each script in this folder belongs to one specific dataset, since the dataset is processed separately. All scripts are written in R and therefore you need to install R to run them. Moreover, they are named based on its dataset source, beginning with `dataCleanup` and ends with its source name `dataCleanup.Rmd`. To ensure term consistency, we used the Disease Ontology and BRENDA Tissue Ontology to standardize context names. When a term was not present, its relationships were identified through other ontologies in the Ontology Lookup Service (OLS) - namely FMA, NCIT, UBERON, and OBA. Finally, we curated the remaining relationships to map them to terms already included. Contexts corresponding to cell lines and qualifiers outside normal and disease (for example, developmental stage) were not included in this release. Furthermore, all gene expression datasets were post-processed to ensure that all gene symbols were consistent with the HGNC-approved symbols. -## miRNA post-processing +# miRNA post-processing The post-processing steps of miRNA expression values are described in the following: -1. Ensure that all miRNAs were updated to miRBase v.22 IDs (miRBase homo-sapiens data: mirdip/mirdip-tissue/data/mirbase/mature_homo-sapiens_dataframe.txt) +1. Ensure that all miRNAs were updated to miRBase v.22 IDs (miRBase homo-sapiens data: `mirdip/mirdip-tissue/data/mirbase/mature_homo-sapiens_dataframe.txt`) 2. Quintize miRNA expression values to enable a more fine-grained analysis of miRNA abundance. Therefore, for each sample, any miRNA with an expression value of zero remained so. The remaining non-zero values were converted to a number between one and five that represented which of the 20th percentiles of non-zero values it corresponds to. 3. Average its quantile-normalized values per miRNA (biological replicates, for instance). 4. Ensure context names are standardized. 5. Convert all miRNA expression values into binary values. -## mRNA post-processing +# mRNA post-processing The post-processing steps of mRNA expression values are described in the following: -1. Ensure that all gene symbols were consistent with the HGNC-approved symbols. -2. Average mRNA expression values per gene (biological replicates, for instance). -3. Ensure context names are standardized. -4. Convert all expression values into binary values. +1. Average mRNA expression values per gene (biological replicates, for instance). +2. Ensure context names are standardized. +3. Convert all expression values into binary values. diff --git a/prediction_update_and_integration/README.md b/prediction_update_and_integration/README.md index 8b97903..0b22cf0 100644 --- a/prediction_update_and_integration/README.md +++ b/prediction_update_and_integration/README.md @@ -39,9 +39,24 @@ The following command run on the results that are placed in the `params.publishD # you may construct this environment from the mirbaseconverter.yml file in this repository conda activate mirbaseconverter; Rscript scripts/mirdip5_run_noisyOR.R \ - -c `pwd` \ - -d ./benchmarks_platinum_large_nodups/ \ - -o mirdip5_noisyor_final.txt + -c `pwd` \ + -d ./benchmarks_platinum_large_nodups/ \ + -o mirdip5_noisyor_final.txt +``` + +# How to run the Nextflow pipeline for genes and miRNA + +## How to run the pipeline for genes + +```bash +nextflow run rnaseq --genome GRCh38 --input /samplesheet.csv --star_index false --gene_bed false --aligner star_rsem --outdir /outputdirectory --save_merged_fastq -profile ijcluster +``` + + +## How to run the pipeline for miRNA + +```bash +nextflow run nf-core/smrnaseq -profile ijcluster --input /samplesheet.csv --outdir /outputdirectory --genome GRCh38 --protocol qiaseq --mirtrace_species hsa -r gittak_ac_config ``` # How to run the Nextflow pipeline for genes and miRNA