Named after the beautiful Grandeur Peak
Image Credit: ryancornia
Location: 40.707, -111.76, 8,299 ft (2,421 m) summit
More information about the trail leading up to this landmark can be found at https://utah.com/hiking/grandeur-peak
Grandeur is a Nextflow workflow developed by @erinyoung at the Utah Public Health Laborotory. "Grandeur" is intended to be a species agnostic sequencing analysis workflow to paired-end Illumina sequencing quality control and assurance (QC) and serotyping in a local public health laboratory.
"Grandeur" is meant to augment CDC's PHOENIX nextflow workflow, which is the official recommended usage. In principle, the contigs generated by PHOENIX undergo additional quality metric and serotyping steps, with a heavy emphasis on fastANI and AMRFinderPlus.
"Grandeur" can also be a standalone workflow that takes paired-end Illumina reads, removes adaptors with fastp and PHIX with bbduk, and creates contigs through de novo alignment of the reads with spades.
"Grandeur" is also a workflow of the staphb-toolkit
Default workflow that takes fastq files, runs them through QC/serotyping/etc, creates contig files
# using singularity
nextflow run UPHL-BioNGS/Grandeur -profile singularity --reads <path to reads>
# using docker
nextflow run UPHL-BioNGS/Grandeur -profile docker --fastas <path to fastas>
- params.sample_sheet / --sample_sheet : specify sample sheet with sample id, forward reads in fastq.gz format, and reverse reads in fastq.gz format
- params.outdir / --outdir : specify directly where results are saved (basic result patterns are granduer/analysis/sample*)
- params.reads / --reads : specify directory with paired-end files
- params.fastas / --fastas : specify directory with fasta files
- params.kraken2_db / --kraken2_db : specify directory of kraken2 database
- params.blast_db / --blast_db : specify directory of blast database (must accompany value for params.blast_db_type)
- params.mash_db / --mash_db : specify reference file for mash
- params.current_datasets / --current_datasets : set to false to avoid downloading genomes from NCBI genomes
- params.iqtree2_outgroup / --iqtree2_outgroup : set outgroup for iqtree2
The README got too long, so it's been moved to a wiki. There are several covered topics including:
- Installation
- Usage
- Subworkflow explanations
- User supplied reference files and databases (optional)
- FAQ
Please submit any issues and problems to issues (or find us on SLACK).
Grandeur wouldn't be possible without the following tools:
- amrfinderplus - identification of genes associated with antimicrobial resistence
- bbduk - removal of PhiX
- blastn - read identification with blobtools
- blobtools - contamination
- circulocov - coverage determination
- datasets - downloads genomes from NCBI
- drprg - TB AMR predictions
- elgato - Legionella pneumophila Sequence Based Typing (SBT)
- emmtyper - Group A Strep "emm" typing
- fastani - species evaluator
- fastp - cleaning reads
- fastqc - fastq file QC
- heatcluster - visualizes SNP matrix from SNP dists
- iqtree2 - phylogenetic tree creation - used after core genome alignment
- kleborate - Klebsiella serotyping
- kraken2 - contamination
- mash - species identifier
- mashtree - tree based on mash distances (not impacted by size of core genome)
- mlst - identification of MLST subtype
- multiqc - summarizes QC efforts
- mykrobe - Mycobacterium subtyping
- panaroo - core genome alignment - optional (set with params.msa = true)
- pbptyper - Penicillin Binding Protein (PBP) typer for Streptococcus pneumoniae assemblies
- phytreeviz - basic tree visualization
- plasmidfinder - MLST typing for plasmids
- prokka - gene annotation - used for core genome alignment
- will be replaced with bakta in a future release
- quast - contig QC
- seqsero2 - Salmonella serotyping
- serotypefinder - E. coli serotyping
- shigatyper - Shigella serotyping
- snp-dists - SNP matrix - used after core genome aligment
- spades - de novo alignment
The expected tools are split into multiple processes. Each process has its own wiki page that we encourage users to view.