Skip to content

AMR prediction output

Michael Hall edited this page Apr 20, 2022 · 4 revisions

This page describes the output of mykrobe predict.

Output options

By default, output is printed to the terminal in comma-separated text format. This can be easily loaded into Excel (or your favourite spreadsheet program). Alternatively, results can be reported in JSON format, which contains more detailed information, is ideal for scripting, but may be harder for a person to read than in a spreadsheet.

The output format is controlled with the option --format: use --format json, --format csv, or --format json_and_csv to choose the output format. To send output to a file instead of to the terminal, use --out filename. See the AMR prediction examples section for complete command line examples.

CSV file

This has the following columns:

Column Name Description
sample Sample name, as given when running mykrobe predict
drug Name of drug
susceptibility Restistance call: "R"=resistant, "S"=Susceptible, "r"=minority resistance detected
variants Name of variant that caused resistance call. See note 1 below for details
genes Name of gene identified that caused resistance call
mykrobe_version Version of mykrobe
files Name of input reads files
probe_sets Name of probe sets used for AMR/lineage calls
genotype_model Name of the genotyping model used to make ref/alt/heterozygous calls
kmer_size kmer length used in probes
phylo_group High level taxonomic identification, for example "Mycobacterium_tuberculosis_complex"
species Species identified in the sample
lineage Lineage identified in the sample. Mixed samples will have more than one entry
phylo_group_per_covg Percent of phylo_group probe that has any coverage
species_per_covg Percent of species probe that has any coverage
lineage_per_covg Percent of lineage probe that has any coverage (see note 2 below)
phylo_group_depth Average depth across phylo_group probe
species_depth Average depth across species probe
lineage_depth Average depth across lineage probe (see note 2 below)

Notes:

  1. The variants column contains entries usually in the format <gene>_<amino acid change>-<dna change>:<ref depth>:<alt depth>:<genotype confidence>. For example, rpsL_K43R-AAG781686AGG:0:513:3132 means K to R amino acid change at position 43 in the rpsL gene, which is AGG to AAG at position 781686 in the genome, with 0 reads supporting the reference, 513 depth on the alternative allele, and a genotype confidence of 3132.
  2. Variants that are upstream of a gene have a negative position, and are of the form <gene>_<dna change>:<ref depth>:<alt depth>:<genotype confidence> (amino acid change is not applicable in this case). For example, pncA_T-12C means a T to C nucleotide change 12bp before the start of gene pncA.
  3. The lineage_per_covg and lineage_depth are likely to say "NA", despite a lineage being identified. The reason for this is when the lineage is called using a hierarchical scheme, such as lineage1, lineage1.1 ... etc there is no single probe for which to report information. Genotype calls across the lineage tree are used to determine the final lineage. The details can be found in the JSON output (see examples below), but are too complex for the CSV output.

JSON file

The general format of the JSON is:

{
  "sample_name": {
    "susceptibility": { ... AMR call information ... },
    "phylogenetics": { ... species/lineage information ...},
    "kmer": <integer>,
    "probe_sets": [ ... list of probe files ...],
    "files": [ ... list of input reads files ... ],
    "version": { ... version information ...},
    "genotype_model": "name of genotype model"
  }
}

Susceptibility dictionary example:

"Ofloxacin": {"predict": "S"},
"Streptomycin": {
  "predict": "R",
  "called_by": {
    "rpsL_K43R-AAG781686AGG": {
      "variant": null,
      "genotype": [1, 1],
      "genotype_likelihoods": [ -3150.49, -99999999, -18.41],
       "info": {
         "coverage": {
           "reference": {
             "percent_coverage": 0.0,
             "median_depth": 0,
             "min_non_zero_depth": 0,
             "kmer_count": 0,
             "klen": 21
           },
           "alternate": {
             "percent_coverage": 100.0,
             "median_depth": 26,
             "min_non_zero_depth": 24,
             "kmer_count": 513,
             "klen": 20
           }
         },
         "expected_depths": [25],
         "contamination_depths": [],
         "filter": [],
         "conf": 3132
      },
      "_cls": "Call.VariantCall"
    }
  }
}

In that example, the sample is called as susceptible to ofloxacin and resistant to streptomycin. Details of the variant call that triggered the streptomycin call are provided. The important pieces of information are: the reference allele had zero coverage, but the alternate allele had 100% coverage at a median depth of 26. The genotype call (1/1) was homozygous for the alternate allele (0/1 would indicate a heterozygous call).

Phylogenetics dictionary example, where species is tuberculosis:

"phylo_group": {
  "Mycobacterium_tuberculosis_complex": {
    "percent_coverage": 99.544,
    "median_depth": 61.0
  }
},
"sub_complex": {
  "Unknown": {
    "percent_coverage": -1,
    "median_depth": -1
  }
},
"species": {
  "Mycobacterium_tuberculosis": {
    "percent_coverage": 98.328,
    "median_depth": 54.0
  }
},
"lineage": {
  "lineage": ["lineage2.2.9"],
  "calls_summary": {
    "lineage2.2.9": {
      "good_nodes": 3,
      "lineage_depth": 3,
      "genotypes": {
        "lineage2": 1,
        "lineage2.2": 1,
        "lineage2.2.9": 1
      }
    }
  },
  "calls": {
    "lineage2.2.9": {
      "lineage2": {
        "G497491A": { ... call info in same format as above call example ...},
      },
      "lineage2.2": {
        "G2505085A": { ... call info in same format as above call example ...},
      },
      "lineage2.2.9": {
        "G4086T": { ... call info in same format as above call example ...},
      }
  }
}

In that example, the lineage was identified as lineage2.2.9 (the entry "lineage": ["lineage2.2.9"]). Had the sample been mixed, the list would have more than one entry, eg ["lineage2.2.9", "lineage4.10"]. The calls_summary section shows a high-level summary of the evidence for this lineage. 2.2.9 means three levels in the lineage tree, as shown by "lineage_depth": 3. All three nodes in the tree were called as agreeing with the lineage, shown by the entry "good_nodes": 3. Next, the genotype calls for those three nodes were alleles agreeing with the lineage calls (in the dictionary "genotypes", all values are 1). Finally, there is a calls dictionary that contains all the details of the genotype calls (details are omitted here because they are in the same format as above for resistance calls).

MTB example where species is not tuberculosis:

"phylo_group": {
    "Mycobacterium_tuberculosis_complex": {
        "percent_coverage": 99.686,
        "median_depth": 97.0
    }
},
"sub_complex": {
    "Unknown": {
        "percent_coverage": -1,
        "median_depth": -1
    }
},
"species": {
    "Mycobacterium_africanum": {
        "percent_coverage": 59.351,
        "median_depth": 169
    }
},
"lineage": {
    "Unknown": {
        "percent_coverage": -1,
        "median_depth": -1
    }
}
Clone this wiki locally