Skip to content

Genotyping

Michael Hall edited this page Apr 16, 2021 · 2 revisions

This page describes how to use mykrobe for genotyping, where you want to genotype samples on a set of known variants, and/or identify sequences of interest. Note that mykrobe does not discover new variants.

Overview

The usual use case for mykrobe is AMR and/or lineage calling. This is where we have a panel of variants, where each variant defines a lineage or is associated with resistance to a drug. Mykrobe genotypes a sample at all of the variants, then links the genotype information to make AMR and/or lineage calls. However, it is possible to use mykrobe to simply genotype variants. Using mykrobe in this way means that it simply runs the genotyping module and reports the results (without going further and predicting drug resistance or lineage).

There are two stages:

  1. generate probe sequences from the variants using mykrobe variants make-probes
  2. run mykrobe predict on each sample, using the probes as input.

Make probes

The use of make-probes is described in detail in the custom panels help page - please see there for instructions.

In short, you will need to make a FASTA of probe sequences from your variants of interest. The probes can be made in any or all of three ways:

  1. from DNA variations (see probes from reference coordinates)
  2. from amino acid changes (see probes from gene coordinates)
  3. from presence/absence sequences

If you use more than one method to make probes, then cat each of the resulting FASTA files together to make one single FASTA file of all the probes.

Run mykrobe

Suppose you have a file of probes called probes.fasta . Mykrobe can be used to genotype all the probes using the following command (replace sample_name with your sample name, and reads.fastq with your reads file).

mykrobe predict --sample sample_name \
  --species custom \
  --custom_probe_set_path probes.fasta \
  --seq reads.fastq \
  --format json \
  -o out.json 

The results will be written to out.json. Note that you need to specify --format json in order to get the genotype information, which is too detailed to be shown in the default output.

The output JSON has a section variant_calls, containing all the genotype calls. The format of the calls is described in the JSON output help page.

Clone this wiki locally