Module 4: NGS Variant Calling Pipeline

Learning Objectives

Understand workflow modularisation
Complete an NGS variant calling pipeline

Modularisation

Nextflow DSL2 pipelines can be separated into modules and included in other scripts
This makes source code easier to maintain and share between pipelines
Take a look at main.nf, we can see the processes DOWNLOAD_REF and DOWNLOAD_FASTQS are included from the source file modules/download.nf

Configuration

Look at nextflow.config
This pipeline has been configured to run on the slurm queue
The Singularity cacheDir is set so you can use pre-downloaded container images, otherwise Nextflow would download them automatically

Exercise 4

The pipeline in main.nf is incomplete.
We will work through completing the "TODO" sections of each process until the pipeline is complete
When complete, the pipeline will:
- Download SARS-CoV-2 paired illumina sequencing from SRA from 20 Victorian isolates across 2020
- Align to the SARS-CoV-2 reference genome
- Call vairants using BCFtools
- Create a plot of variants, visualising the changes in viral genotype over time

Exercise 4.1

Run main.nf, downloading the sequencing data

nextflow run ~/wehi-nextflow-training/module_4/main.nf

Exercise 4.2

Complete process INDEX_REF in modules/index_ref.nf
Uncomment the corresponding lines in main.nf referencing INDEX_REF

Run main.nf

nextflow run ~/wehi-nextflow-training/module_4/main.nf -resume

Solution

process INDEX_REF {
    cpus 1
    memory '1 GB'
    time '1 h'
    module 'bwa'
    module 'samtools'

    input:
    path(ref_fasta_gz)

    output:
    tuple path('ref.fasta'), path("ref.fasta.*")

    script:
    """
    gzip -cd $ref_fasta_gz > ref.fasta
    samtools faidx ref.fasta
    bwa index ref.fasta
    """
}

Exercise 4.3

Complete process BWA_MEM_ALIGN in modules/bwa_mem_align.nf
Uncomment the corresponding lines in main.nf referencing BWA_MEM_ALIGN

Run main.nf

nextflow run ~/wehi-nextflow-training/module_4/main.nf -resume

Solution

process BWA_MEM_ALIGN {
    cpus 2
    memory '2 GB'
    time '2 h'
    module 'bwa'
    module 'samtools'
    tag "$sample"

    input:
    tuple val(sample), path(fastq1), path(fastq2)
    tuple path(ref_fasta), path(ref_indices)

    output:
    tuple val(sample), path(bam)

    script:
    bam = sample + '.bam'
    """
    bwa mem -M -t $task.cpus -R '@RG\\tID:$sample\\tSM:$sample' $ref_fasta $fastq1 $fastq2 |
        samtools view -b > $bam
    """
}

Exercise 4.4

Complete process SAMTOOLS_SORT in modules/samtools_sort.nf
Uncomment the corresponding lines in main.nf referencing SAMTOOLS_SORT

Run main.nf

nextflow run ~/wehi-nextflow-training/module_4/main.nf -resume

Solution

process SAMTOOLS_SORT {
    cpus 2
    memory '2 GB'
    time '1 h'
    module 'samtools'
    tag "$sample"

    input:
    tuple val(sample), path(input_bam)

    output:
    tuple val(sample), path(sorted_bam), path(bam_index)

    script:
    sorted_bam = sample + '.sorted.bam'
    bam_index = sorted_bam + '.bai'
    """
    samtools sort --threads $task.cpus $input_bam > $sorted_bam
    samtools index $sorted_bam
    """
}

Exercise 4.5

Complete process BCFTOOLS_CALL in modules/bcftools_call.nf
Uncomment the corresponding lines in main.nf referencing BCFTOOLS_CALL

Run main.nf

nextflow run ~/wehi-nextflow-training/module_4/main.nf -resume

Solution

process BCFTOOLS_CALL {
    cpus 2
    memory '2 GB'
    time '1 h'
    container "quay.io/biocontainers/bcftools:1.16--hfe4b78e_1"
    tag "$sample"

    input:
    tuple val(sample), path(sorted_bam), path(bam_index)
    tuple path(ref_fasta), path(ref_indices)

    output:
    path bcf

    script:
    bcf =  sample + '.bcf'
    """
    bcftools mpileup -Ou -f $ref_fasta $sorted_bam | bcftools call -mv -Ob -o $bcf
    """
}

Exercise 4.6

Complete process BCFTOOLS_MERGE in modules/bcftools_merge.nf
Uncomment the corresponding lines in main.nf referencing BCFTOOLS_MERGE

Run main.nf

nextflow run ~/wehi-nextflow-training/module_4/main.nf -resume

Solution

process BCFTOOLS_MERGE {
    cpus 2
    memory '2 GB'
    time '1 h'
    container "quay.io/biocontainers/bcftools:1.16--hfe4b78e_1"

    input:
    path(bcfs)

    output:
    path merged_vcf
    
    script:
    merged_vcf = 'merged.vcf.gz'
    """
    bcftools merge --threads $task.cpus --no-index --missing-to-ref -Oz $bcfs > $merged_vcf
    """
}

Project scripts

Scripts placed in the bin directory may be called from a process
take a look at bin/plot_variants.R, which is used by the process PLOT_VARIANTS in modules/plot_variants.nf

Exercise 4.7

Complete process PLOT_VARIANTS in modules/plot_variants.nf
Uncomment the corresponding lines in main.nf referencing PLOT_VARIANTS
Uncomment the workflow.onComplete {...} section in main.nf

Run main.nf and look at the output plot

nextflow run ~/wehi-nextflow-training/module_4/main.nf -resume

Solution

process PLOT_VARIANTS {
    cpus 1
    memory '2 GB'
    time '1 h'
    container 'library://jemunro/training/tidyverse-pheatmap'
    publishDir "results", mode: 'copy'

    input:
    path(vcf)
    path(metadata)

    output:
    path(plot)

    script:
    plot = 'plot.png'
    """
    plot_variants.R $vcf $metadata $plot
    """
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Module 4: NGS Variant Calling Pipeline

Learning Objectives

Modularisation

Configuration

Exercise 4

Exercise 4.1

Exercise 4.2

Exercise 4.3

Exercise 4.4

Exercise 4.5

Exercise 4.6

Project scripts

Exercise 4.7

Files

README.md

Latest commit

History

README.md

File metadata and controls

Module 4: NGS Variant Calling Pipeline

Learning Objectives

Modularisation

Configuration

Exercise 4

Exercise 4.1

Exercise 4.2

Exercise 4.3

Exercise 4.4

Exercise 4.5

Exercise 4.6

Project scripts

Exercise 4.7