Named after the beautiful Donut Falls
Location: 40.630°N 111.655°W, Elevation: 7,942 ft (2,421 m), Hiking level: easy
(Image credit: User submitted photos at alltrails.com)
More information about the trail leading up to this landmark can be found at utah.com/hiking/donut-falls
Donut Falls is a Nextflow workflow developed by @erinyoung at the Utah Public Health Laborotory for long-read nanopore sequencing of microbial isolates. Built to work on linux-based operating systems. Additional config options are needed for cloud batch usage.
Donut Falls is also included in the staphb toolkit staphb-toolkit.
We made a wiki, please read it!
nextflow run UPHL-BioNGS/Donut_Falls -profile <singularity or docker> --sample_sheet <sample_sheet.csv>
Sample sheet is a csv file with the name of the sample and corresponding nanopore fastq.gz file on a single row with header sample
and fastq
. When Illumina fastq files are available for polishing or hybrid assembly, they are added to end of each row under column header fastq_1
and fastq_2
.
Option 1 : just nanopore reads
sample,fastq
test,long_reads_low_depth.fastq.gz
Option 2 : nanopore reads and at least one sample has Illumina paired-end fastq files
sample,fastq,fastq_1,fastq_2
sample1,sample1.fastq.gz,sample1_R1.fastq.gz,sample1_R2.fastq.gz
sample2,sample2.fastq.gz,,
There are currently several options for assembly
These are specified with the assembler
paramater. If Illumina reads are found, then flye and raven assemblies will be polished with those reads.
Note: more than one assembler can be chosen (i.e. params.assembler = 'flye,raven'
). This will run the input files on each assembler listed. Listing an assembler more than once will not create additional assemblies with that tool (i.e. params.assembler = 'flye,flye,flye'
will still only run the input files through flye once).
Although not used for anything else, the sequencing summary file can be read in and put through nanoplot to visualize the quality of a sequencing run. This is an optional file and can be set with 'params.sequencing_summary'.
nextflow run UPHL-BioNGS/Donut_Falls -profile singularity --sequencing_summary <sequencing summary file>
- WARNING : Does not work with older versions of the summary file.
# nanopore assembly with flye followed by polishing if illumina files are supplied
nextflow run UPHL-BioNGS/Donut_Falls -profile singularity --sample_sheet sample_sheet.csv
# or with docker and specifying the assembler
nextflow run UPHL-BioNGS/Donut_Falls -profile singularity --sample_sheet sample_sheet.csv --assembler flye
# hybrid assembly with unicycler where both nanopore and illumina files are required
nextflow run UPHL-BioNGS/Donut_Falls -profile singularity --sample_sheet sample_sheet.csv --assembler unicycler
# assembling with all three asssemblers
# specifying the results to be stored in 'donut_falls_test_results' instead of 'donut_falls'
# using docker instead of singularity
nextflow run UPHL-BioNGS/Donut_Falls -profile docker --sample_sheet sample_sheet.csv --assembler unicycler,flye,raven
# using some test files (requires internet connection)
nextflow run UPHL-BioNGS/Donut_Falls -profile docker --sample_sheet sample_sheet.csv --test
# same as above
nextflow run UPHL-BioNGS/Donut_Falls -profile docker,test --sample_sheet sample_sheet.csv
Donut Falls would not be possible without
- bandage : visualize gfa files
- busco : assessment of assembly quality
- bwa : aligning reads for polypolish
- circulocov : read depth per contig
- dnaapler : rotation
- fastp : cleaning illumina reads (default values) and nanopore reads (minimum length = 1,000 & minimum Q = 12)
- flye : de novo assembly (default assembler)
- gfastats : assessment of assembly
- medaka : polishing with nanopore reads
- multiqc : amalgamation of results
- nanoplot : fastq file QC visualization
- polypolish : reduces sequencing artefacts through polishing with Illumina reads
- pypolca : reduces sequencing artefacts through polishing with Illumina reads
- rasusa : subsampling nanopore reads to 150X depth
- raven : de novo assembly option (params.assembler = 'raven')
- unicycler : hybrid assembly option (params.assembler = 'unicycler')