-
Notifications
You must be signed in to change notification settings - Fork 28
Genotyping
This page describes how to use mykrobe for genotyping, where you want to genotype samples on a set of known variants, and/or identify sequences of interest. Note that mykrobe does not discover new variants.
The usual use case for mykrobe is AMR and/or lineage calling. This is where we have a panel of variants, where each variant defines a lineage or is associated with resistance to a drug. Mykrobe genotypes a sample at all of the variants, then links the genotype information to make AMR and/or lineage calls. However, it is possible to use mykrobe to simply genotype variants. Using mykrobe in this way means that it simply runs the genotyping module and reports the results (without going further and predicting drug resistance or lineage).
There are two stages:
- generate probe sequences from the variants using
mykrobe variants make-probes
- run
mykrobe predict
on each sample, using the probes as input.
The use of make-probes
is described in detail in the custom panels help page - please see there for instructions.
In short, you will need to make a FASTA of probe sequences from your variants of interest. The probes can be made in any or all of three ways:
- from DNA variations (see probes from reference coordinates)
- from amino acid changes (see probes from gene coordinates)
- from presence/absence sequences
If you use more than one method to make probes, then cat
each of the resulting FASTA files together to make one single FASTA file of all the probes.
Suppose you have a file of probes called probes.fasta
. Mykrobe can be used to genotype all the probes using the following command (replace sample_name
with your sample name, and reads.fastq
with your reads file).
mykrobe predict --sample sample_name \
--species custom \
--custom_probe_set_path probes.fasta \
--seq reads.fastq \
--format json \
-o out.json
The results will be written to out.json
. Note that you need to specify --format json
in order to get the genotype information, which is too detailed to be shown in the default output.
The output JSON has a section variant_calls
, containing all the genotype calls. The format of the calls is described in the JSON output help page.