A Snakemake workflow for TotalRNA analysis from the Department of Environmental Science of Aarhus University.
conda activate snakemake
git clone https://github.com/AU-ENVS-Bioinformatics/TotalRNA-Snakemake
cd TotalRNA-Snakemake
snakemake -c1 skip_rename # or snakemake -n rename
snakemake -c100 --use-conda --keep-going
This pipeline manages large-scale TotalRNA meta-transcriptomic data for taxonomic analyses of SSU reads and mRNA ANALYSIS. The steps involved are:
- Trim reads using trim-galore.
- Filtering SSU and LSU reads using sormerna and SILVA.
- Reconstructing ribosomal genes using Metarib.
- Checking the quality of the ribosomal assembly using QUAST.
- Mapping RNA contigs to reads using BWA and samtools.
- Classifying reads taxonomically using BLAST, SILVA and CREST.
- Assembling non-rRNA reads (Trinity) and filtering noncoding RNA using the RFam database.
- Mapping mRNA contigs to reads using BWA and samtools.
- Functional (best-hit) and taxonomic (LCA) annotation of mRNA contigs using Diamond and AnnoTree, which includes KEGG, Pfam and Tigrfam annotations for over 30,000 bacterial and 1600 archaeal genomes.
Check the Wiki of the project for more information.
It is best to pre-install Mamba before starting. All other dependencies will be installed automatically when running the pipeline for the first time.
conda activate base
mamba create -c conda-forge -c bioconda -n snakemake snakemake
Activating conda environment:
conda activate snakemake
Clone this git repository to the location where you want to run your analysis.
git clone https://github.com/AU-ENVS-Bioinformatics/TotalRNA-Snakemake TotalRNA-Snakemake-Project
cd TotalRNA-Snakemake-Project
Copy or symlink raw fastq files into the ´reads´ directory. See reads/README.md for more information. Now, we are going to rename those files and made symlinks to the results/renamed
directory. To skip this step, just copy your files into results/renamed
and skip the next step. Alternatively, you can run snakemake -c1 skip_rename
to symlink your files without renaming them.
snakemake -n rename
snakemake -c1 rename
Check that all your samples are in results/renamed
:
ls results/renamed_raw_reads/
Check that the pipeline will behave as expected by running a dry run and check the configuration file if not.
snakemake -n --use-conda
Finally, run the whole pipeline. A useful flag to add is --keep-going
to prevent the pipeline to stop if an error occurs. If you are running this in a shared environment, you can have all the conda environments in a shared location by adding --conda-prefix /path/to/shared/conda/envs
.
snakemake -c100 --use-conda --keep-going
You should consider re-running the AnnoTree notebook with custom parameters interactively (notebook/annotree.ipynb)
Please find more information in the Wiki of the project.