Skip to content

patidarr/ngs_pipeline

Repository files navigation

![Snakemake](https://img.shields.io/badge/snakemake- >=3.8.0-brightgreen.svg?style=flat-square)

Introduction

This is the implementation of KhanLab NGS Pipeline using Snakemake.

Installation

The easiest way to get this pipeline is to clone the repository.

git clone https://github.com/patidarr/ngs_pipeline.git

This pipeline is available on NIH biowulf cluster, contact me if you would like to do a test run. The data from this pipeline could directly be ported in OncoGenomics-DB, an application created to visualize NGS data available to NIH users.

Requirements

mutt
gnu parallel
SLURM or PBS for resource management
Bioinformatics Tools Listed in config files

Following R Packages

Conventions

  • Sample names cannot have "/" or "." in them
  • Fastq files end in ".fastq.gz"
  • Fastq files are stored in DATA_DIR (Set as Environment Variable)

DNASeq:

  • QC
  • BWA, Novoalign
  • Broad Standard Practices on bwa bam
  • Haplotype Caller, Platupys, Bam2MPG, MuTect, Strelka
  • snpEff, Annovar, SIFT, pph2, Custom Annotation
  • Coverage Plot, Circos Plot, Hotspot Coverage Box Plot
  • Create input format for oncogenomics database (Patient Level)
  • Make Actionable Classification for Germline and Somatic Mutations
  • Copy number based on the simple T/N LogRatio (N cov >=30), Corrected for Total # Reads
  • Copy number, tumor purity using sequenza
  • LRR adjusted to center
  • Contamination using conpair
  • HLA Typing
  • Neoantigen Prediction
    • pVAC-Seq methods: NNalign,NetMHC,NetMHCIIpan,NetMHCcons,NetMHCpan,PickPocket,SMM,SMMPMBEC,SMMalign
      epitope length: 8,9,10,11

RNASeq:

  • QC
  • Tophat, STAR
  • Broad Standard Practices on STAR bam
  • fusion-catcher, tophat-fusion, deFuse
  • Cufflinks (ENS and UCSC)
  • Rsubread TPM (ENS, UCSC), Gene, Transcript and Exon Level
  • In-house Exon Expression (ENS and UCSC)
  • Haplotype Caller
  • snpEff, Annovar, SIFT, pph2, Custom Annotation
  • Actionable Fusion classification

Patient:

  • Genotyping On Patient. 1000g sites are evaluated for every library and then compared (all vs all) If two libraries come from a patient the match should be pretty good >80%
  • Still to develop: If the match is below a certain threshold, break the pipeline for patient.

Rulegraph

alt tag

DAG for example Sample alt tag