Vignette #2: Visualizing results

microRNA annotations

If ShortStack is asked to annotate MIRNA loci (--known_miRNAs and/or --dn_mirna) and one or more loci pass all the criteria, then there will be a directory called "strucVis" in the main output directory. Inside are PDF image files for each locus that passed all MIRNA criteria. The PDF images show the predicted secondary structure of a hypothetical precursor RNA. It is important to note that the start and stop of this precursor are NOT EXPERIMENTALLY DETERMINED. Instead, the precursor RNA is simply extended ~20 nucleotides on either end of the microRNA / miRNA* duplex.

example_strucVis

The secondary structure is annotated with colors the correspond to small RNA-seq alignment depth coverage (raw reads, NOT reads per million) at each nucleotide position. This helps to visually show the microRNA/microRNA* pattern of read accumulation.

Custom MIRNA visualization with strucVis

strucVis is a standalone tool that is called by ShortStack to create the colorful MIRNA visualizations. Users can easily make their own strucVis calls of any genomic region so long as they already have a bam-formmated alignment file of their small RNA-seq reads. The strucVis executable will be in the user's environment along with a proper installation of ShortStack using conda / Bioconda. Here's an example of a custom strucVis call:

strucVis -b ExampleShortStackRun/merged_alignments.bam -g Arabidopsis_thalianaTAIR10.fa -c 5:7740583-7740720 -s minus -p custom_strucVis_file -n exampleLocus

A text-based representation of the locus goes to STDOUT, and the PDF image is in the working directory (here: "custom_strucVis_file.pdf"). See the strucVis README for details.

Genome browsers

ShortStack's output includes files that are directly useful for loading onto genome browsers. Additionally, a helper tool called ShortTracks is installed with ShortStack. ShortTracks converts BAM files of aligned small RNA-seq date into bigwig formatted browser tracks. Importantly, ShortTracks can create bigwigs that address the unique visualization needs of small RNA seq data: Coverage by small RNA length and by genomic strand.

Jbrowse2 - Desktop version (preferred)

JBrowse2 desktop version is my preferred genome browser for small RNA-seq data. This is because it has the capability of displaying "multi-wiggle" quantitative tracks. This allows visualization of small RNA-seq coverage, read-length, AND genomic strand on the same browser track. All three of these parameters are important for assessing small RNA data.

First create bigwig formatted tracks of small RNA coverage by size and strand. There are two ways to do this:

During the ShortStack run itself, use the option --make_bigwigs. This tells ShortStack to use ShortTracks to create the desired bigwig files after alignments. The --make_bigwigs option has no effect unless ShortStack is actually aligning reads.
Use ShortTracks to make the bigwig tracks manually, using an existing BAM file. ShortTracks will be installed in the same environment along with ShortStack. Here's how to make bigwig files by read length and strand, using an existing ShortStack-created bam file:

ShortTracks --mode readlength --stranded --bamfile ExampleShortStackRun/merged_alignments.bam

This creates 8 bigwig files separated by RNA length and genomic strand. Sizes are 21nts, 22nts, 23 & 24nts, and others. Strands are plus (p) and minus (m). Importantly these bigwig files are quantified in reads per million mapped (RPMM), with the mapped denominator being all aligned reads regardless of length or strand. This means that they can be quantitatively compared.

To load Jbrowse-Desktop, first obtain and install the application per instructions. Load the genomic file used as the reference. Start a new session, then use the JBrowse Available Tracks dialog to add browser tracks. You can add the following using the Default add track workflow (Add a track --> Default add track workflow):

Results.gff3 , showing the ShortStack-derived clusters from your run.
known_miRNAs.gff3, showing the aligned locations of each sequence provided in the --known_miRNAs file of the ShortStack run.
Protein-coding mRNAs (gff or bed): This is not made by ShortStack, you'll need to obtain that from another source for your genome of interest.

To load the bigwigs as a multi-bigwig track, use the "Add a track" tab and select the "Multi-wiggle Track" selector. Use "Choose files from your computer" and select the 8 relevant files. In my example those files would be:

merged_alignments_21_m.bw
merged_alignments_21_p.bw
merged_alignments_22_m.bw
merged_alignments_22_p.bw
merged_alignments_23-24_m.bw
merged_alignments_23-24_p.bw
merged_alignments_other_m.bw
merged_alignments_other_p.bw

Once the track is initially loaded, adjust the settings to set "renderer type" to either "xyplot" or "multiline"

xyplot

Then, adjust the color of each track such that the negative and positive strand tracks for each size have the same color. In track settings, select "Edit colors/arrangement". Here's a suggested color scheme:

21 plus and minus: blue
22 plus and minus: mediumseagreen
23-24 plus and minus: tomato
other : gray

color_multi_wiggle

Your final multi-wiggle track should look like:

final_multiwig_track

IGV Desktop

The IGV genome browser is popular and easy to use. Here I use the Desktop version. Because it doesn't support multi-wiggle tracks we should NOT use the strand- and length-specific tracks described above. Instead, use ShortTracks to create a "simple" bigwig track. This track aggregates total small RNA coverage regardless of RNA length or genomic strand. Like before, this single bigwig track is scaled in units of reads per million mapped (RPMM).

ShortTracks --mode simple --bamfile ExampleShortStackRun/merged_alignments.bam

In IGV, use "Genomes --> Load genome from file" to load the genome FASTA file used for your analysis.

You can then add tracks using "File --> Load from file". Tracks you likely want are:

Results.gff3 , showing the ShortStack-derived clusters from your run.
known_miRNAs.gff3, showing the aligned locations of each sequence provided in the --known_miRNAs file of the ShortStack run.
Protein-coding mRNAs (gff or bed): This is not made by ShortStack, you'll need to obtain that from another source for your genome of interest.
The simple bigwig track. In my example, this is called merged_alignments.bw.

Your view will look something like this:

igv_screenshot

CAUTION using bam files on browsers

As of version 4.1.0 ShortStack's alignments use "condensed reads". Within a FASTQ file, multiple reads with the exact same sequence are "condensed" into a single FASTA entry. The number of original reads is noted in the FASTA header. The condensed reads are aligned to the genome. If multi-mapped, ShortStack allocated the reads using custom tags in the BAM format. See the main README for details. The result is that a single "alignment" line in the BAM file may represent more than one read. Genome browsers have no way of knowing this. Thus, any "coverage" plots produced from bam files by genome browsers will be misleading. An alignment that represents 10,000 reads, for example, would only show a coverage value of 1. Strongly suggest not loading ShortStack's bam files onto genome browsers for this reason. Instead, using the bigwig files (see above) for accurate visualization of sRNA-seq read coverage.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly