Skip to content

2). Overview

Duncan Berger edited this page Nov 11, 2024 · 1 revision

Table of contents

Simplified schematic overview



The pipeline will perform the following steps:

1). Read decontamination and quality control

  • Assess pre-QC read quality using FastQC.
  • Remove adaptors from reads using fastp.
  • Identify host-contaminant reads using Kraken2 and BWA-MEM2 against a host reference and genome, respectively.
  • Assess post-QC read quality using FastQC.
  • Merge pre- and post-read QC metrics into a summary report using MultiQC.

2). Read-based taxonomic annotation

3). Assembly

  • Assembly post-QC host-decontaminated reads using Megahit.

4). Assembly binning

  • Classify contigs into classes: archaea, bacteria, prokarya, eukarya, organelle (mitochondria, plastid) or unknown using Tiara.
  • Perform metagenomic binning, separate out contigs into individual metagenome assembled genomes (MAGs), using SemiBin2, COMEBin and Metabat2.
  • Generate consensus bins from SemiBin2 and COMEBin and outputs using DAS Tool

5). Contig analysis

  • Identify viral, proviral and plasmid sequences in across all contigs using geNomad.
  • Identify closest taxonomic hits to each contig using skani.
  • Calculate contig length and GC content using SeqKit.
  • Calculate read depth (coverage) using Samtools.

6). Bin quality control

  • Assess the quality of genome bins using CheckM.
  • Assign taxonomic classifications to each MAG using GTDB-Tk.
  • Assess the quality of genome bins using CheckM.
  • Calculate assembly statistics using
  • Merge bin QC and contig QC results into summary reports.

8). Typing

Subset MAGs of interest (target species) using and dependent on taxonomic classification pass them on to individual subworkflows (run per-MAG).

8a). Bacteria
8b). Listeria monocytogenes
  • Perform in silico serogroup typing prediction using LisSero.
8c). Salmonella
  • Perform in silico Salmonella serotyping using SeqSero2.
  • Predict serovar, antigen gene and cgMLST alleles using SISTR.
  • Identify antimicrobial resistance genes and lineages of S. Typhi and S. Paratyphi B using Mykrobe.
8d). Escherichia coli / Shigella spp.
  • Differentiate Shigella/Enteroinvasive Escherichia coli and identify serotype using ShigEIFinder.
  • Determine Shigella serotype using ShigaTyper.
  • Determine E. coli serotype using ECTyper.
  • Identify serotype of Shigatoxin producing E. coli using STECFinder.
  • Identify antimicrobial resistance genes and lineages of S. sonnei using Mykrobe.

9). Antimicrobial resistance

  • Identify AMR genes (incl. point mutations) and virulence/stress resistance genes using AMRFinderPlus.
  • Identify AMR and virulence resistance genes across multiple databases using ABRicate.
  • Identify AMR genes (incl.point mutations) using RGI.
  • Identify AMR genes using ResFinder.
  • Identify AMR conferring point mutations using PointFinder.
  • Merge the AMR results into a summary report.

Finally, all major results are merged into a single summary report, which include the per-bin typing information produce in step 8).