Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create README.md in mirdip-tissue #1

Merged
merged 4 commits into from
Feb 5, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 7 additions & 8 deletions mirdip-tissue/sources/README.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,19 @@

=======
# Post-processing of Contexts for RNAseq data for genes and miRNAs

Post-processing steps for mRNA as well as miRNA are available in the corresponding folders, `postprocessingMRNA` and `postprocessingMiRNA`. Each script in this folder belongs to one specific dataset, since the dataset is processed separately. All scripts are written in R and therefore you need to install R to run them. Moreover, they are named based on its dataset source, beginning with `dataCleanup` and ends with its source name `dataCleanup<datasetname>.Rmd`. To ensure term consistency, we used the Disease Ontology and BRENDA Tissue Ontology to standardize context names. When a term was not present, its relationships were identified through other ontologies in the Ontology Lookup Service (OLS) - namely FMA, NCIT, UBERON, and OBA. Finally, we curated the remaining relationships to map them to terms already included. Contexts corresponding to cell lines and qualifiers outside normal and disease (for example, developmental stage) were not included in this release. Furthermore, all gene expression datasets were post-processed to ensure that all gene symbols were consistent with the HGNC-approved symbols.

## miRNA post-processing
# miRNA post-processing
The post-processing steps of miRNA expression values are described in the following:
1. Ensure that all miRNAs were updated to miRBase v.22 IDs (miRBase homo-sapiens data: mirdip/mirdip-tissue/data/mirbase/mature_homo-sapiens_dataframe.txt)
1. Ensure that all miRNAs were updated to miRBase v.22 IDs (miRBase homo-sapiens data: `mirdip/mirdip-tissue/data/mirbase/mature_homo-sapiens_dataframe.txt`)
2. Quintize miRNA expression values to enable a more fine-grained analysis of miRNA abundance. Therefore, for each sample, any miRNA with an expression value of zero remained so. The remaining non-zero values were converted to a number between one and five that represented which of the 20th percentiles of non-zero values it corresponds to.
3. Average its quantile-normalized values per miRNA (biological replicates, for instance).
4. Ensure context names are standardized.
5. Convert all miRNA expression values into binary values.


## mRNA post-processing
# mRNA post-processing
The post-processing steps of mRNA expression values are described in the following:
1. Ensure that all gene symbols were consistent with the HGNC-approved symbols.
2. Average mRNA expression values per gene (biological replicates, for instance).
3. Ensure context names are standardized.
4. Convert all expression values into binary values.
1. Average mRNA expression values per gene (biological replicates, for instance).
2. Ensure context names are standardized.
3. Convert all expression values into binary values.
21 changes: 18 additions & 3 deletions prediction_update_and_integration/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,9 +39,24 @@ The following command run on the results that are placed in the `params.publishD
# you may construct this environment from the mirbaseconverter.yml file in this repository
conda activate mirbaseconverter;
Rscript scripts/mirdip5_run_noisyOR.R \
-c `pwd` \
-d ./benchmarks_platinum_large_nodups/ \
-o mirdip5_noisyor_final.txt
-c `pwd` \
-d ./benchmarks_platinum_large_nodups/ \
-o mirdip5_noisyor_final.txt
```

# How to run the Nextflow pipeline for genes and miRNA

## How to run the pipeline for genes

```bash
nextflow run rnaseq --genome GRCh38 --input /samplesheet.csv --star_index false --gene_bed false --aligner star_rsem --outdir /outputdirectory --save_merged_fastq -profile ijcluster
```


## How to run the pipeline for miRNA

```bash
nextflow run nf-core/smrnaseq -profile ijcluster --input /samplesheet.csv --outdir /outputdirectory --genome GRCh38 --protocol qiaseq --mirtrace_species hsa -r gittak_ac_config
```

# How to run the Nextflow pipeline for genes and miRNA
Expand Down