- Understand workflow modularisation
- Complete an NGS variant calling pipeline
- Nextflow DSL2 pipelines can be separated into modules and included in other scripts
- This makes source code easier to maintain and share between pipelines
- Take a look at main.nf, we can see the processes
DOWNLOAD_REF
andDOWNLOAD_FASTQS
are included from the source file modules/download.nf
- Look at nextflow.config
- This pipeline has been configured to run on the slurm queue
- The Singularity
cacheDir
is set so you can use pre-downloaded container images, otherwise Nextflow would download them automatically
- The pipeline in main.nf is incomplete.
- We will work through completing the "TODO" sections of each process until the pipeline is complete
- When complete, the pipeline will:
- Download SARS-CoV-2 paired illumina sequencing from SRA from 20 Victorian isolates across 2020
- Align to the SARS-CoV-2 reference genome
- Call vairants using BCFtools
- Create a plot of variants, visualising the changes in viral genotype over time
- Run main.nf, downloading the sequencing data
nextflow run ~/wehi-nextflow-training/module_4/main.nf
-
Complete process
INDEX_REF
in modules/index_ref.nf -
Uncomment the corresponding lines in main.nf referencing
INDEX_REF
-
Run main.nf
nextflow run ~/wehi-nextflow-training/module_4/main.nf -resume
Solution
process INDEX_REF { cpus 1 memory '1 GB' time '1 h' module 'bwa' module 'samtools' input: path(ref_fasta_gz) output: tuple path('ref.fasta'), path("ref.fasta.*") script: """ gzip -cd $ref_fasta_gz > ref.fasta samtools faidx ref.fasta bwa index ref.fasta """ }
-
Complete process
BWA_MEM_ALIGN
in modules/bwa_mem_align.nf -
Uncomment the corresponding lines in main.nf referencing
BWA_MEM_ALIGN
-
Run main.nf
nextflow run ~/wehi-nextflow-training/module_4/main.nf -resume
Solution
process BWA_MEM_ALIGN { cpus 2 memory '2 GB' time '2 h' module 'bwa' module 'samtools' tag "$sample" input: tuple val(sample), path(fastq1), path(fastq2) tuple path(ref_fasta), path(ref_indices) output: tuple val(sample), path(bam) script: bam = sample + '.bam' """ bwa mem -M -t $task.cpus -R '@RG\\tID:$sample\\tSM:$sample' $ref_fasta $fastq1 $fastq2 | samtools view -b > $bam """ }
-
Complete process
SAMTOOLS_SORT
in modules/samtools_sort.nf -
Uncomment the corresponding lines in main.nf referencing
SAMTOOLS_SORT
-
Run main.nf
nextflow run ~/wehi-nextflow-training/module_4/main.nf -resume
Solution
process SAMTOOLS_SORT { cpus 2 memory '2 GB' time '1 h' module 'samtools' tag "$sample" input: tuple val(sample), path(input_bam) output: tuple val(sample), path(sorted_bam), path(bam_index) script: sorted_bam = sample + '.sorted.bam' bam_index = sorted_bam + '.bai' """ samtools sort --threads $task.cpus $input_bam > $sorted_bam samtools index $sorted_bam """ }
-
Complete process
BCFTOOLS_CALL
in modules/bcftools_call.nf -
Uncomment the corresponding lines in main.nf referencing
BCFTOOLS_CALL
-
Run main.nf
nextflow run ~/wehi-nextflow-training/module_4/main.nf -resume
Solution
process BCFTOOLS_CALL { cpus 2 memory '2 GB' time '1 h' container "quay.io/biocontainers/bcftools:1.16--hfe4b78e_1" tag "$sample" input: tuple val(sample), path(sorted_bam), path(bam_index) tuple path(ref_fasta), path(ref_indices) output: path bcf script: bcf = sample + '.bcf' """ bcftools mpileup -Ou -f $ref_fasta $sorted_bam | bcftools call -mv -Ob -o $bcf """ }
-
Complete process
BCFTOOLS_MERGE
in modules/bcftools_merge.nf -
Uncomment the corresponding lines in main.nf referencing
BCFTOOLS_MERGE
-
Run main.nf
nextflow run ~/wehi-nextflow-training/module_4/main.nf -resume
Solution
process BCFTOOLS_MERGE { cpus 2 memory '2 GB' time '1 h' container "quay.io/biocontainers/bcftools:1.16--hfe4b78e_1" input: path(bcfs) output: path merged_vcf script: merged_vcf = 'merged.vcf.gz' """ bcftools merge --threads $task.cpus --no-index --missing-to-ref -Oz $bcfs > $merged_vcf """ }
- Scripts placed in the bin directory may be called from a process
- take a look at bin/plot_variants.R, which is used by the process
PLOT_VARIANTS
in modules/plot_variants.nf
-
Complete process
PLOT_VARIANTS
in modules/plot_variants.nf -
Uncomment the corresponding lines in main.nf referencing
PLOT_VARIANTS
-
Uncomment the
workflow.onComplete {...}
section in main.nf -
Run main.nf and look at the output plot
nextflow run ~/wehi-nextflow-training/module_4/main.nf -resume
Solution
process PLOT_VARIANTS { cpus 1 memory '2 GB' time '1 h' container 'library://jemunro/training/tidyverse-pheatmap' publishDir "results", mode: 'copy' input: path(vcf) path(metadata) output: path(plot) script: plot = 'plot.png' """ plot_variants.R $vcf $metadata $plot """ }