Skip to content

skani cookbook

Jim Shaw edited this page Oct 12, 2023 · 9 revisions

skani cookbook - common use cases

This cookbook presents some examples of common use cases for skani and how to set parameters.

This is not a definitive guide but may be helpful for further investigation. See the basic or advanced guides for documentation.

Searching bacterial/archaeal genomes against a large database

skani sketch -t 10 -l list_of_genome_names.txt -o database
skani search genomes_in_a_folder/* -d database > results.tsv

Important points

  • skani's defaults usually work fine for bacterial/archaeal genomes
  • the -l option takes each genome file as a line in a text file
  • search uses less memory and is fast for querying a few genomes

All-to-all bacterial/archaeal species-level (>95% ANI) comparison for dereplication

skani triangle -s 93 my_genome_folder/* -t (threads) -E > results.tsv 

# OR

skani triangle -s 93 my_genome_folder/* -t (threads) -E --medium > results.tsv 

Important points

  • triangle sets better defaults than dist for all-to-all comparison
  • -s 93 means skani performs ANI computation only if the ANI is approximately > 93%, speeding up computation. This ensure genomes with close to 95% ANI get compared. Even -s 95 is probably okay too.
  • -E outputs results in a tsv format instead of a matrix format.
  • --medium may give slightly more accurate results for very fragmented genomes at the cost of speed, but usually not a huge deal.

All-to-all comparison with lots of small contigs (e.g. viruses, plasmids)

skani triangle viruses.fna -i -m 200 --slow (OR --medium) -t (threads) -E --faster-small -s 90 > results.tsv

Important points

  • -i uses contigs within the fasta file for comparison
  • -m 200 sets marker k-mers to appear 1/200 bases. We want genomes to have ~ 20 marker k-mers for good screening. Large contigs -> set this higher. Small contigs -> set this smaller.
  • small genomes may benefit from the --slow or --medium options. This sets -c to be smaller and gives better AFs, and sometimes (but not always!) better ANIs.
  • --faster-small makes skani faster by using more aggressive ANI filtering. This helps for large data sets, optional on small ones.
  • -s 90 sets skani to screen comparisons for only approximately > 90% ANI. Feel free to set this higher or lower. Do not expect filtering to be accurate for small genomes and < 85% ANI.