Benchmarking variant callers on simulated shotgun metagenomic data. Implementing a bioinformatic pipeline from synthesizing reads to alignment and variant calling.
Figure 1. Workflow diagram showing the variant caller benchmarking process. First, select RefSeq genomes were chosen to simulate a metagenome and random mutations were added to the genomes to create a "gold standard" dataset. Then the number of genomes used and number of reads created were adjusted to evaluate the variant callers under a range of sample conditions.
- Directory containing synthetic reads, variant caller output, and benchmarks
- All output of Benchmarking Workflow goes here
Production-stage scripts that form the core Benchmarking workflow.
- Script performing directory setup, SNP generation, read synthesis, and alignment
- Script running variant callers and benchmarking on data generated by Genesis.sh
- Core workflow broken up into individual scripts (read generation,alignment,individual variant callers, etc)
- Python code in Genesis.sh used to "inject" SNPs into fasta files
- Creates a log of input SNPs and genome locations