-
Notifications
You must be signed in to change notification settings - Fork 30
Vignette #2: Visualizing results
If ShortStack is asked to annotate MIRNA loci (--known_miRNAs
and/or --dn_mirna
) and one or more loci pass all the criteria, then there will be a directory called "strucVis" in the main output directory. Inside are PDF image files for each locus that passed all MIRNA criteria. The PDF images show the predicted secondary structure of a hypothetical precursor RNA. It is important to note that the start and stop of this precursor are NOT EXPERIMENTALLY DETERMINED. Instead, the precursor RNA is simply extended ~20 nucleotides on either end of the microRNA / miRNA* duplex.
The secondary structure is annotated with colors the correspond to small RNA-seq alignment depth coverage (raw reads, NOT reads per million) at each nucleotide position. This helps to visually show the microRNA/microRNA* pattern of read accumulation.
strucVis is a standalone tool that is called by ShortStack to create the colorful MIRNA visualizations. Users can easily make their own strucVis
calls of any genomic region so long as they already have a bam-formmated alignment file of their small RNA-seq reads. The strucVis
executable will be in the user's environment along with a proper installation of ShortStack using conda / Bioconda. Here's an example of a custom strucVis
call:
strucVis -b ExampleShortStackRun/merged_alignments.bam -g Arabidopsis_thalianaTAIR10.fa -c 5:7740583-7740720 -s minus -p custom_strucVis_file -n exampleLocus
A text-based representation of the locus goes to STDOUT, and the PDF image is in the working directory (here: "custom_strucVis_file.pdf"). See the strucVis README for details.
ShortStack's output includes files that are directly useful for loading onto genome browsers. Additionally, a helper tool called ShortTracks is installed with ShortStack. ShortTracks
converts BAM files of aligned small RNA-seq date into bigwig formatted browser tracks. Importantly, ShortTracks can create bigwigs that address the unique visualization needs of small RNA seq data: Coverage by small RNA length and by genomic strand.
JBrowse2 desktop version is my preferred genome browser for small RNA-seq data. This is because it has the capability of displaying "multi-wiggle" quantitative tracks. This allows visualization of small RNA-seq coverage, read-length, AND genomic strand on the same browser track. All three of these parameters are important for assessing small RNA data.
First create bigwig formatted tracks of small RNA coverage by size and strand. There are two ways to do this:
- During the ShortStack run itself, use the option
--make_bigwigs
. This tellsShortStack
to useShortTracks
to create the desired bigwig files after alignments. The--make_bigwigs
option has no effect unlessShortStack
is actually aligning reads. - Use
ShortTracks
to make the bigwig tracks manually, using an existing BAM file.ShortTracks
will be installed in the same environment along withShortStack
. Here's how to make bigwig files by read length and strand, using an existingShortStack
-created bam file:
ShortTracks --mode readlength --stranded --bamfile ExampleShortStackRun/merged_alignments.bam
This creates 8 bigwig files separated by RNA length and genomic strand. Sizes are 21nts, 22nts, 23 & 24nts, and others. Strands are plus (p) and minus (m). Importantly these bigwig files are quantified in reads per million mapped (RPMM), with the mapped denominator being all aligned reads regardless of length or strand. This means that they can be quantitatively compared.
To load Jbrowse-Desktop, first obtain and install the application per instructions. Load the genomic file used as the reference. Start a new session, then use the JBrowse Available Tracks dialog to add browser tracks. You can add the following using the Default add track workflow (Add a track --> Default add track workflow):
-
Results.gff3
, showing the ShortStack-derived clusters from your run. -
known_miRNAs.gff3
, showing the aligned locations of each sequence provided in the--known_miRNAs
file of the ShortStack run. - Protein-coding mRNAs (gff or bed): This is not made by ShortStack, you'll need to obtain that from another source for your genome of interest.
To load the bigwigs as a multi-bigwig track, use the "Add a track" tab and select the "Multi-wiggle Track" selector. Use "Choose files from your computer" and select the 8 relevant files. In my example those files would be:
merged_alignments_21_m.bw
merged_alignments_21_p.bw
merged_alignments_22_m.bw
merged_alignments_22_p.bw
merged_alignments_23-24_m.bw
merged_alignments_23-24_p.bw
merged_alignments_other_m.bw
merged_alignments_other_p.bw
Once the track is initially loaded, adjust the settings to set "renderer type" to either "xyplot" or "multiline"
Then, adjust the color of each track such that the negative and positive strand tracks for each size have the same color. In track settings, select "Edit colors/arrangement". Here's a suggested color scheme:
- 21 plus and minus: blue
- 22 plus and minus: mediumseagreen
- 23-24 plus and minus: tomato
- other : gray
Your final multi-wiggle track should look like:
The IGV genome browser is popular and easy to use. Here I use the Desktop version. Because it doesn't support multi-wiggle tracks we should NOT use the strand- and length-specific tracks described above. Instead, use ShortTracks to create a "simple" bigwig track. This track aggregates total small RNA coverage regardless of RNA length or genomic strand. Like before, this single bigwig track is scaled in units of reads per million mapped (RPMM).
ShortTracks --mode simple --bamfile ExampleShortStackRun/merged_alignments.bam
In IGV, use "Genomes --> Load genome from file" to load the genome FASTA file used for your analysis.
You can then add tracks using "File --> Load from file". Tracks you likely want are:
-
Results.gff3
, showing the ShortStack-derived clusters from your run. -
known_miRNAs.gff3
, showing the aligned locations of each sequence provided in the--known_miRNAs
file of the ShortStack run. - Protein-coding mRNAs (gff or bed): This is not made by ShortStack, you'll need to obtain that from another source for your genome of interest.
- The simple bigwig track. In my example, this is called
merged_alignments.bw
.
Your view will look something like this:
As of version 4.1.0 ShortStack's alignments use "condensed reads". Within a FASTQ file, multiple reads with the exact same sequence are "condensed" into a single FASTA entry. The number of original reads is noted in the FASTA header. The condensed reads are aligned to the genome. If multi-mapped, ShortStack allocated the reads using custom tags in the BAM format. See the main README for details. The result is that a single "alignment" line in the BAM file may represent more than one read. Genome browsers have no way of knowing this. Thus, any "coverage" plots produced from bam files by genome browsers will be misleading. An alignment that represents 10,000 reads, for example, would only show a coverage value of 1. Strongly suggest not loading ShortStack's bam files onto genome browsers for this reason. Instead, using the bigwig files (see above) for accurate visualization of sRNA-seq read coverage.