Skip to content

Method description

Charlotte Soneson edited this page Sep 27, 2019 · 6 revisions

This page outlines the steps included in the provided workflow. If you use the workflow for your analyses, please cite the original publications, and report the version numbers of the software you are using.

FastQC (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) is used to perform quality control of the raw reads. Reads are then trimmed with TrimGalore! (https://www.bioinformatics.babraham.ac.uk/projects/trim_galore/), with a quality cutoff of 20 and a minimal length of 20 bp. A quasi-mapping transcriptome index is generated using Salmon (Patro et al., 2017), which is also used to estimate transcript abundances, incorporating sequence and GC content bias. Estimated abundances and feature annotation information are imported into R using the tximeta package (Love et al., 2019), which provides a wrapper around tximport (Soneson et al., 2016). In parallel, reads are mapped to the genome using STAR (Dobin et al., 2013), and bigWig files are created for visualization in genome browsers. The quasi-likelihood framework of edgeR (Robinson et al., 2010, Lun et al., 2016) is used to perform differential gene expression, accounting for differences in the average length of expressed transcripts between samples (Soneson et al., 2016), and gene set analysis is performed using the camera function (Wu and Smyth, 2012) from the limma package (Ritchie et al., 2015), using gene sets from mSigDB (http://software.broadinstitute.org/gsea/msigdb), accessed via the msigdbr package (https://cran.r-project.org/web/packages/msigdbr/index.html). Differential transcript usage analysis is performed using DRIMSeq (Nowicka and Robinson, 2016). Finally, MultiQC (Ewels et al., 2016) is used to summarize the output of FastQC, TrimGalore!, Salmon and STAR, and a SummarizedExperiment object with gene-level quantifications, sample and feature annotations as well as differential expression results is exported and can be used for further downstream analysis or explored visually with packages such as iSEE (Rue-Albrecht et al., 2018).

References

Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras T: STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 29(1):15–21 (2013).

Ewels P, Magnusson M, Lundin S, Käller M: MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32(19):3047-3048 (2016).

Love MI, Soneson C, Hickey PF, Johnson LK, Pierce NT, Shepherd L, Morgan M, Patro R: Tximeta: reference sequence checksums for provenance identification in RNA-seq. bioRxiv doi:10.1101/777888 (2019).

Lun AT, Chen Y, Smyth GK: It's DE-licious: A Recipe for Differential Expression Analyses of RNA-seq Experiments Using Quasi-Likelihood Methods in edgeR. Methods Mol Biol 1418:391-416 (2016).

Nowicka M, Robinson MD: DRIMSeq: a Dirichlet-multinomial framework for multivariate count outcomes in genomics. F1000Research 5:1356 (2016).

Patro R, Duggal G, Love MI, Irizarry RA, Kingsford C: Salmon provides fast and bias-aware quantification of transcript expression. Nature Methods 14:417–419 (2017).

Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, Smyth GK: limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Research 43(7):e47 (2015).

Robinson MD, McCarthy DJ, Smyth GK: edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26(1):139-140 (2010).

Rue-Albrecht K, Marini F, Soneson C, Lun ATL: iSEE: Interactive SummarizedExperiment Explorer. F1000Research 7:741 (2018).

Soneson C, Love MI, Robinson MD: Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences. F1000Research 4:1521 (2016).

Wu D, Smyth GK: Camera: a competitive gene set test accounting for inter-gene correlation. Nucleic Acids Research 40(17):e133 (2012).