Comparison of sequencing platforms with a control dataset.
This repository contains scripts to compare different sequencing platforms using a dataset that contains known compositions of synthetic SARS-CoV-2 viral control mixtures. The dataset is described more fully in our paper A Gold Standard Dataset and Evaluation of Methods for Lineage Abundance Estimation from Wastewater.
Brand | Platform | Primer scheme |
---|---|---|
ONT | PromethION | ARTIC V4.1 |
Illumina | NextSeq 2000 | ARTIC V3 |
Complete Genomics | G99 | ARTIC V3 |
In the platforms/ directory, you will find subdirectories for each platform, each containing (locally) the BAMs generated for each dataset along with the scripts used to organize them in preparation for downstream analysis. These are in directories like platforms/<platform>/<background>/alignments
The actual alignment files are too large to share here but instructions for how to download them can be found in the README.md files for each platform.
The plots/ directory contains any scripts (mostly jupyter notebooks) used to generate the plots used in understanding the similarities and differences between platforms.
All scripts (and relative paths) used here are expected to be run directly from within the root of this repository. Any jupyter notebook is run from the directory where it is stored.
Filtered alignments were . These files were originally produced using variations of our covid-analysis pipeline currently found on our cluster at /projects/enviro_lab/scripts/covid-analysis
. The exact nextflow workflow script within that directory can be determined by the table below:
analysis name | platform | paired? | aligner | script |
---|---|---|---|---|
ont | PromethION | No | minimap2 | analyzeReads.nf |
cg | G99 | No | minimap2 | analyzeReads_cg.nf |
illumina_ss | MiSeq2000 | No | minimap2 | analyzeReads_illumina.nf |
illumina_minimap2 | MiSeq2000 | Yes | minimap2 | analyzeReads_illumina_ds.nf |
illumina_bwa | MiSeq2000 | Yes | bwa | analyzeReads_illumina_ds.nf |
Those BAMs were then linked in thier appropriate locations by running arrange_files.sh. Note that this script uses absolute paths specific to our directory structure and may only act as a guide.
Running Freyja (necessary for plots produced in plots/abundance_analysis)
We're comparing all the results using freyja's aggregated output. This can be created by running run_freyja_multi.sh. This works best by running it in two steps:
- Uncomment the code block marked
(FIRST RUN)
within run_freyja_multi.sh. - Comment out the section used before and uncomment the code block marked
(SECOND RUN)
.
Running mosdepth (necessary for plots produced in plots/coverage_analysis)
To get coverage details for each sample, run get_coverage_multi.sh. This requires the conda env conda/env-mosdepth or the mosdepth executable to be in your path.
We don't yet have a citation for this repository, but that will be forthcoming. In the meantime, check out our deconvolution paper to learn more about our general process. What we did there just on ONT has been expanded here to the other sequencing platforms.