-
Notifications
You must be signed in to change notification settings - Fork 11
skani cookbook
Jim Shaw edited this page Oct 12, 2023
·
9 revisions
This cookbook presents some examples of common use cases for skani and how to set parameters.
This is not a definitive guide but may be helpful for further investigation. See the basic or advanced guides for documentation.
skani sketch -t 10 -l list_of_genome_names.txt -o database
skani search genomes_in_a_folder/* -d database > results.tsv
Important points
- skani's defaults usually work fine for bacterial/archaeal genomes
- the
-l
option takes each genome file as a line in a text file -
search
uses less memory and is fast for querying a few genomes
skani triangle -s 93 my_genome_folder/* -t (threads) -E > results.tsv
# OR
skani triangle -s 93 my_genome_folder/* -t (threads) -E --medium > results.tsv
Important points
-
triangle
sets better defaults thandist
for all-to-all comparison -
-s 93
means skani performs ANI computation only if the ANI is approximately > 93%, speeding up computation. This ensure genomes with close to 95% ANI get compared. Even-s 95
is probably okay too. -
-E
outputs results in a tsv format instead of a matrix format. -
--medium
may give slightly more accurate results for very fragmented genomes at the cost of speed, but usually not a huge deal.
skani triangle viruses.fna -i -m 200 --slow (OR --medium) -t (threads) -E --faster-small -s 90 > results.tsv
Important points
-
-i
uses contigs within the fasta file for comparison -
-m 200
sets marker k-mers to appear 1/200 bases. We want genomes to have ~ 20 marker k-mers for good screening. Large contigs -> set this higher. Small contigs -> set this smaller. - small genomes may benefit from the
--slow
or--medium
options. This sets-c
to be smaller and gives better AFs, and sometimes (but not always!) better ANIs. -
--faster-small
makes skani faster by using more aggressive ANI filtering. This helps for large data sets, optional on small ones. -
-s 90
sets skani to screen comparisons for only approximately > 90% ANI. Feel free to set this higher or lower. Do not expect filtering to be accurate for small genomes and < 85% ANI.