Skip to content

Commit

Permalink
Merge pull request #1 from ijlab/post-processing
Browse files Browse the repository at this point in the history
Create README.md in mirdip-tissue
  • Loading branch information
dylanht authored Feb 5, 2025
2 parents b37cb59 + f925bd4 commit 0136a49
Show file tree
Hide file tree
Showing 2 changed files with 25 additions and 11 deletions.
15 changes: 7 additions & 8 deletions mirdip-tissue/sources/README.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,19 @@

=======
# Post-processing of Contexts for RNAseq data for genes and miRNAs

Post-processing steps for mRNA as well as miRNA are available in the corresponding folders, `postprocessingMRNA` and `postprocessingMiRNA`. Each script in this folder belongs to one specific dataset, since the dataset is processed separately. All scripts are written in R and therefore you need to install R to run them. Moreover, they are named based on its dataset source, beginning with `dataCleanup` and ends with its source name `dataCleanup<datasetname>.Rmd`. To ensure term consistency, we used the Disease Ontology and BRENDA Tissue Ontology to standardize context names. When a term was not present, its relationships were identified through other ontologies in the Ontology Lookup Service (OLS) - namely FMA, NCIT, UBERON, and OBA. Finally, we curated the remaining relationships to map them to terms already included. Contexts corresponding to cell lines and qualifiers outside normal and disease (for example, developmental stage) were not included in this release. Furthermore, all gene expression datasets were post-processed to ensure that all gene symbols were consistent with the HGNC-approved symbols.

## miRNA post-processing
# miRNA post-processing
The post-processing steps of miRNA expression values are described in the following:
1. Ensure that all miRNAs were updated to miRBase v.22 IDs (miRBase homo-sapiens data: mirdip/mirdip-tissue/data/mirbase/mature_homo-sapiens_dataframe.txt)
1. Ensure that all miRNAs were updated to miRBase v.22 IDs (miRBase homo-sapiens data: `mirdip/mirdip-tissue/data/mirbase/mature_homo-sapiens_dataframe.txt`)
2. Quintize miRNA expression values to enable a more fine-grained analysis of miRNA abundance. Therefore, for each sample, any miRNA with an expression value of zero remained so. The remaining non-zero values were converted to a number between one and five that represented which of the 20th percentiles of non-zero values it corresponds to.
3. Average its quantile-normalized values per miRNA (biological replicates, for instance).
4. Ensure context names are standardized.
5. Convert all miRNA expression values into binary values.


## mRNA post-processing
# mRNA post-processing
The post-processing steps of mRNA expression values are described in the following:
1. Ensure that all gene symbols were consistent with the HGNC-approved symbols.
2. Average mRNA expression values per gene (biological replicates, for instance).
3. Ensure context names are standardized.
4. Convert all expression values into binary values.
1. Average mRNA expression values per gene (biological replicates, for instance).
2. Ensure context names are standardized.
3. Convert all expression values into binary values.
21 changes: 18 additions & 3 deletions prediction_update_and_integration/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,9 +39,24 @@ The following command run on the results that are placed in the `params.publishD
# you may construct this environment from the mirbaseconverter.yml file in this repository
conda activate mirbaseconverter;
Rscript scripts/mirdip5_run_noisyOR.R \
-c `pwd` \
-d ./benchmarks_platinum_large_nodups/ \
-o mirdip5_noisyor_final.txt
-c `pwd` \
-d ./benchmarks_platinum_large_nodups/ \
-o mirdip5_noisyor_final.txt
```

# How to run the Nextflow pipeline for genes and miRNA

## How to run the pipeline for genes

```bash
nextflow run rnaseq --genome GRCh38 --input /samplesheet.csv --star_index false --gene_bed false --aligner star_rsem --outdir /outputdirectory --save_merged_fastq -profile ijcluster
```


## How to run the pipeline for miRNA

```bash
nextflow run nf-core/smrnaseq -profile ijcluster --input /samplesheet.csv --outdir /outputdirectory --genome GRCh38 --protocol qiaseq --mirtrace_species hsa -r gittak_ac_config
```

# How to run the Nextflow pipeline for genes and miRNA
Expand Down

0 comments on commit 0136a49

Please sign in to comment.