-
Notifications
You must be signed in to change notification settings - Fork 28
AMR prediction output
This page describes the output of mykrobe predict
.
By default, output is printed to the terminal in comma-separated text format. This can be easily loaded into Excel (or your favourite spreadsheet program). Alternatively, results can be reported in JSON format, which contains more detailed information, is ideal for scripting, but may be harder for a person to read than in a spreadsheet.
The output format is controlled with the option --format
: use --format json
, --format csv
, or --format json_and_csv
to choose the output format. To send output to a file instead of to the terminal, use --out filename
. See the AMR prediction examples section for complete command line examples.
This has the following columns:
Column Name | Description |
---|---|
sample | Sample name, as given when running mykrobe predict
|
drug | Name of drug |
susceptibility | Restistance call: "R"=resistant, "S"=Susceptible, "r"=minority resistance detected |
variants | Name of variant that caused resistance call. See note 1 below for details |
genes | Name of gene identified that caused resistance call |
mykrobe_version | Version of mykrobe |
files | Name of input reads files |
probe_sets | Name of probe sets used for AMR/lineage calls |
genotype_model | Name of the genotyping model used to make ref/alt/heterozygous calls |
kmer_size | kmer length used in probes |
phylo_group | High level taxonomic identification, for example "Mycobacterium_tuberculosis_complex" |
species | Species identified in the sample |
lineage | Lineage identified in the sample. Mixed samples will have more than one entry |
phylo_group_per_covg | Percent of phylo_group probe that has any coverage |
species_per_covg | Percent of species probe that has any coverage |
lineage_per_covg | Percent of lineage probe that has any coverage (see note 2 below) |
phylo_group_depth | Average depth across phylo_group probe |
species_depth | Average depth across species probe |
lineage_depth | Average depth across lineage probe (see note 2 below) |
Notes:
- The
variants
column contains entries usually in the format<gene>_<amino acid change>-<dna change>:<ref depth>:<alt depth>:<genotype confidence>
. For example,rpsL_K43R-AAG781686AGG:0:513:3132
means K to R amino acid change at position 43 in the rpsL gene, which is AGG to AAG at position 781686 in the genome, with 0 reads supporting the reference, 513 depth on the alternative allele, and a genotype confidence of 3132. - Variants that are upstream of a gene have a negative position, and are of the form
<gene>_<dna change>:<ref depth>:<alt depth>:<genotype confidence>
(amino acid change is not applicable in this case). For example,pncA_T-12C
means aT
toC
nucleotide change 12bp before the start of genepncA
. - The
lineage_per_covg
andlineage_depth
are likely to say "NA", despite a lineage being identified. The reason for this is when the lineage is called using a hierarchical scheme, such as lineage1, lineage1.1 ... etc there is no single probe for which to report information. Genotype calls across the lineage tree are used to determine the final lineage. The details can be found in the JSON output (see examples below), but are too complex for the CSV output.
The general format of the JSON is:
{
"sample_name": {
"susceptibility": { ... AMR call information ... },
"phylogenetics": { ... species/lineage information ...},
"kmer": <integer>,
"probe_sets": [ ... list of probe files ...],
"files": [ ... list of input reads files ... ],
"version": { ... version information ...},
"genotype_model": "name of genotype model"
}
}
Susceptibility dictionary example:
"Ofloxacin": {"predict": "S"},
"Streptomycin": {
"predict": "R",
"called_by": {
"rpsL_K43R-AAG781686AGG": {
"variant": null,
"genotype": [1, 1],
"genotype_likelihoods": [ -3150.49, -99999999, -18.41],
"info": {
"coverage": {
"reference": {
"percent_coverage": 0.0,
"median_depth": 0,
"min_non_zero_depth": 0,
"kmer_count": 0,
"klen": 21
},
"alternate": {
"percent_coverage": 100.0,
"median_depth": 26,
"min_non_zero_depth": 24,
"kmer_count": 513,
"klen": 20
}
},
"expected_depths": [25],
"contamination_depths": [],
"filter": [],
"conf": 3132
},
"_cls": "Call.VariantCall"
}
}
}
In that example, the sample is called as susceptible to ofloxacin and resistant to streptomycin. Details of the variant call that triggered the streptomycin call are provided. The important pieces of information are: the reference allele had zero coverage, but the alternate allele had 100% coverage at a median depth of 26. The genotype call (1/1) was homozygous for the alternate allele (0/1 would indicate a heterozygous call).
Phylogenetics dictionary example, where species is tuberculosis
:
"phylo_group": {
"Mycobacterium_tuberculosis_complex": {
"percent_coverage": 99.544,
"median_depth": 61.0
}
},
"sub_complex": {
"Unknown": {
"percent_coverage": -1,
"median_depth": -1
}
},
"species": {
"Mycobacterium_tuberculosis": {
"percent_coverage": 98.328,
"median_depth": 54.0
}
},
"lineage": {
"lineage": ["lineage2.2.9"],
"calls_summary": {
"lineage2.2.9": {
"good_nodes": 3,
"lineage_depth": 3,
"genotypes": {
"lineage2": 1,
"lineage2.2": 1,
"lineage2.2.9": 1
}
}
},
"calls": {
"lineage2.2.9": {
"lineage2": {
"G497491A": { ... call info in same format as above call example ...},
},
"lineage2.2": {
"G2505085A": { ... call info in same format as above call example ...},
},
"lineage2.2.9": {
"G4086T": { ... call info in same format as above call example ...},
}
}
}
In that example, the lineage was identified as lineage2.2.9
(the entry "lineage": ["lineage2.2.9"]
). Had the sample been mixed, the list would have more than one entry, eg ["lineage2.2.9", "lineage4.10"]
. The calls_summary
section shows a high-level summary of the evidence for this lineage. 2.2.9
means three levels in the lineage tree, as shown by "lineage_depth": 3
. All three nodes in the tree were called as agreeing with the lineage, shown by the entry "good_nodes": 3
. Next, the genotype calls for those three nodes were alleles agreeing with the lineage calls (in the dictionary "genotypes"
, all values are 1
). Finally, there is a calls
dictionary that contains all the details of the genotype calls (details are omitted here because they are in the same format as above for resistance calls).
MTB example where species is not tuberculosis
:
"phylo_group": {
"Mycobacterium_tuberculosis_complex": {
"percent_coverage": 99.686,
"median_depth": 97.0
}
},
"sub_complex": {
"Unknown": {
"percent_coverage": -1,
"median_depth": -1
}
},
"species": {
"Mycobacterium_africanum": {
"percent_coverage": 59.351,
"median_depth": 169
}
},
"lineage": {
"Unknown": {
"percent_coverage": -1,
"median_depth": -1
}
}