Skip to content

Tips for finishing genomes

Ryan Wick edited this page Sep 6, 2017 · 5 revisions

Ideally, a Unicycler hybrid assembly will result in a completed bacterial genome all by itself. But if it doesn't, then the genome might need 'manual completion'. This page contains some tips and tricks to help you along.

Check for completion

In a completed assembly, each chromosome/plasmid in the genome is represented by a single contig. How do you tell if a Unicycler assembly is complete? Unicycler's output/log might help. In the 'Bridged assembly graph' section towards the end of Unicycler's pipeline, it will summarise the graph components:

Component   Segments   Links   Length      N50         Longest segment   Status  
    total          7       7   5,676,472   5,583,468         5,583,468           
        1          1       1   5,583,468   5,583,468         5,583,468   complete
        2          1       1      71,104      71,104            71,104   complete
        3          1       1       6,657       6,657             6,657   complete
        4          1       1       5,783       5,783             5,783   complete
        5          1       1       3,514       3,514             3,514   complete
        6          1       1       3,223       3,223             3,223   complete
        7          1       1       2,723       2,723             2,723   complete

Unicycler considers a component complete if it is circular: one segment and one link. This doesn't quite apply if your bacterial genome has linear chromosomes/plasmids, in which case a complete component would have no links.

You could also view the assembly graph (assembly.gfa) in Bandage and check that each contig is circular: If that's what your graph looks like, then Unicycler completed the assembly on its own!

But what if it's not complete? The Unicycler log might have something like this:

Component   Segments   Links   Length      N50         Longest segment   Status    
    total         23      29   5,819,363   5,242,094         5,242,094             
        1          1       1   5,242,094   5,242,094         5,242,094     complete
        2          1       1     252,269     252,269           252,269     complete
        3          1       1     130,933     130,933           130,933     complete
        4          1       1     110,494     110,494           110,494     complete
        5          1       1      69,826      69,826            69,826     complete
        6          1       1       5,783       5,783             5,783     complete
        7         17      23       7,964       1,023             3,382   incomplete

and the Bandage graph might look like this: Yuck! This genome needs some manual completion...

Manual completion

There are many reasons why Unicycler might fail to complete a hybrid assembly, and so there is no single easy method for manual completion. You'll need to rely on detective work and bioinformatics-know-how. Some general methods which may help are:

  • Using Bandage to visualise the assembly graphs from various stages of the Unicycler pipeline.
  • Gathering long reads for incomplete regions of the assembly (see Read extraction) and BLASTing them to the graphs.
  • Aligning short and/or long reads to the assembly and examining the alignments in IGV or Artemis.
  • Using other assemblers (e.g. Canu) on the reads and comparing the results to Unicycler's assembly.

Helpful software:

  • Bandage to view/edit assembly graphs
  • minimap2 to quickly align long reads

To get you going, here are some real-world examples of assemblies which failed to complete and how I went about manual completion: