Skip to content

Germline DNA‐Seq Pipeline

sprokopec edited this page Nov 30, 2023 · 1 revision

Overview

How to run on H4H:

module load perl

perl /path/to/pughlab_dnaseq_germline_pipeline.pl \
-t /path/to/germline_pipeline_config.yaml \
-d /path/to/germline_data_config.yaml \
--preprocessing \
--qc \
--variant_calling \
--summarize \
--create_report \
-c slurm \
--remove \
--dry-run { optional }

Directory Structure

PROJECT
├── logs
├── TOOL1
│   ├── PATIENT-01
│   │   ├── SAMPLE-01-A
│   │   └── SAMPLE-01-B
│   └── PATIENT-02
│       └── SAMPLE-02-A
├── TOOL2
│   ├── PATIENT-01
│   │   ├── SAMPLE-01-A
│   │   └── SAMPLE-01-B
│   └── PATIENT-02
│       └── SAMPLE-02-A
└── TOOL3

Pipeline Stages

  • Preprocessing: The preprocessing step will run fastqc, BWA-MEM alignments, GATK's indel realignment and BQSR functions. This step expects the data config to list and describe fastq files.

  • QC: Ths QC step will run various Picard functions (sequencing artefacts, insert size, alignment summary, etc.), GATK's depthOfCoverage and obtain an estimate of callable bases. This step expects the data config to list GATK-processed BAM files.

  • Variant-Calling: The variant-calling step will run all of the variant-calling tools (SNV/INDEL/CNV/SV) requested in the pipeline tool config yaml file. This step expects the data config to list GATK-processed BAM files.

  • Germline SNV/INDEL detection:

    • HaplotypeCaller (per-BAM)
    • GenotypeGVCFs + VQSR (per-cohort)
    • CPSR (per patient)
  • Germline CNV detection:

    • GATK's gCNV pipeline (cohort-level with outputs for each sample)
    • ERDS gCNV pipeline (uses output from haplotypecaller; use carefully as we have not thoroughly validated this)
    • Delly germline SVs (extracts copy-number from the DEL/DUP calls)
  • Germline SV detection:

    • Delly germline SVs (includes DEL/DUP/INV/TRA/INS)
    • Manta germline SVs
    • SViCT (targeted panel only)
    • MAVIS (combines and validates calls from the above tools)
  • Other SNV/INDEL detection tools will be run only if requested in order to produce tool-specific panel of normals:

    • MuTect (v1; will be run in artefact detection mode)
    • MuTect (v2; will be run in artefact detection mode)
    • Strelka (will be run in germline mode)
    • VarScan (will be run in tumour-only mode)
    • VarDict (will be run in tumour-only mode)