NGSeasy: A Dockerized NGS pipeline and tool-box
With NGSeasy you can now have full suite of NGS tools up and running on any high end workstation in an afternoon
Authors: Stephen J Newhouse and Amos Folarin
Release Version: 1.0-r001
Release: dirty_tango
Publication: Folarin AA, Dobson RJ and Newhouse SJ. NGSeasy: a next generation sequencing pipeline in Docker containers [version 1; referees: 3 approved with reservations] F1000Research 2015, 4(ISCB Comm J):997 (doi: 10.12688/f1000research.7104.1).
- NGSeasy-1.0 Full Production release will be available Late 2015
- NGSeasy-1.0-r001 (dirty_tango) contains most of the core functionality to go from raw fastq to raw vcf calls
- NGSeasy will update every 12 months
- GUI in development
- Lets us know if you want other tools added to NGSeasy
- Fork it!
- Create your feature branch:
git checkout -b my-new-feature
- Commit your changes:
git commit -am 'Add some feature'
- Push to the branch:
git push origin my-new-feature
- Submit a pull request!
NGSeasy: Genome Comparison & Analytic Testing (GCAT) Reports
Here we provide a quick look at basic NGSeasy performance (more results coming soon).
GCAT Report | Test Data | Pipeline |
---|---|---|
NGSEASY-NTRIM-BWA-FREEBAYES-D | illumina-100bp-pe-exome-150x | fastq > bwa > freebayes |
NGSEASY-NTRIM-BWA-PLATYPUS-D | illumina-100bp-pe-exome-150x | fastq > bwa > platypus |
An example of the run commands:
ngseasy -c ngseasy_test.config.freebayes.tsv -d /media/Data/ngs_projects
ngseasy -c ngseasy_test.config.platypus.tsv -d /media/Data/ngs_projects
Please contact us for help/guidance on using the beta release.
Author | |||
---|---|---|---|
Dr Stephen J Newhouse | stephen.j.newhouse@gmail.com | @s_j_newhouse | View Steve's profile on LinkedIn |
Dr Amos Folarin | amosfolarin@gmail.com | @amosfolarin | View Amos's profile on LinkedIn |
Please Direct all queries to [https://github.com/KHP-Informatics/ngseasy/issues]
When sending bug reports etc please provide:-
- Date of Download
- OS and version
- Basic Machine Specs (CPU, RAM)
- Network Speed (Testing Internet Connection Speed)
- The Code you ran eg:-
ngseasy -c my.config.tsv -d /My/Dir
- your config file
- The exact error as printed to screen
WARNING! NGSeasy is not numpty or bad data proof!
Please read the docs, stay calm, take your time and think about what you are doing...and if [www.google.com] doesnt help, then please direct all queries to [https://github.com/KHP-Informatics/ngseasy/issues].
Full instructions at https://docs.docker.com/.
Some fixes to make life easy...allows you to run docker
without sudo
.
This may differ for your OS, and mostly applies to flavours of Linux
. Check with your sys admin or just Google https://www.google.com.
MAC/Windows users using http://boot2docker.io/ should be fine. Read the docs or just Google https://www.google.com.
Create a docker group
sudo addgroup docker
Add user to docker group
Here user is ec2-user
sudo usermod -aG docker ec2-user
Log out and log back in.
This ensures your user is running with the correct permissions.
Verify your work by running docker
without sudo
.
docker run hello-world
..this is what you should get...
Unable to find image 'hello-world:latest' locally
Pulling repository hello-world
91c95931e552: Download complete
a8219747be10: Download complete
Status: Downloaded newer image for hello-world:latest
Hello from Docker.
This message shows that your installation appears to be working correctly.
To generate this message, Docker took the following steps:
1. The Docker client contacted the Docker daemon.
2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
(Assuming it was not already locally available.)
3. The Docker daemon created a new container from that image which runs the
executable that produces the output you are currently reading.
4. The Docker daemon streamed that output to the Docker client, which sent it
to your terminal.
To try something more ambitious, you can run an Ubuntu container with:
$ docker run -it ubuntu bash
For more examples and ideas, visit:
http://docs.docker.com/userguide/
This post reviews the various security implications of using Docker to run applications within containers, and how to address them: How Secure are Containers?
Docker containers are, by default, quite secure; especially if you take care of running your processes inside the containers as non-privileged users (i.e. non root).
#############################################
## Get NGSeasy ##
#############################################
cd /home/${USER}
git clone https://github.com/KHP-Informatics/ngseasy.git
- Default install directory is
/home/${USER}
- in this example user home is
/home/ec2-user
make INSTALLDIR="/home/ec2-user" all
- sets up top level directory structure
- gets all docker images
- gets indexed hg19 and b37 genomes
- gets GATK recources for hg19 and b37 genomes
- gets whole genome and exome test data
- Always set your
INSTALLDIR
: If you runsudo make all
the install path will be/home/root
. Please dont do this! sudo make install
installs scripts to/usr/local/bin/
#############################################
## install NGSeasy ##
#############################################
cd ngseasy
## 1.
make INSTALLDIR="/home/ec2-user" all
## 2.
sudo make install
Installation can take a while, 1-2 hours, so go get a coffee../just chill...if your network is bad...then who knows how long...still..just chill...or go get fast internet!
All NGSeasy applications are run as the non-root user pipeman
within each container
> 500 Mbit/s : anything less will add a lot of time to set up (days - weeks).
Testing Internet Connection Speed
source : http://askubuntu.com/questions/104755/how-to-check-internet-speed-via-terminal
wget -O speedtest-cli https://raw.github.com/sivel/speedtest-cli/master/speedtest_cli.py
chmod +x speedtest-cli
./speedtest-cli
Retrieving speedtest.net configuration...
Retrieving speedtest.net server list...
Testing from Comcast Cable (x.x.x.x)...
Selecting best server based on ping...
Hosted by FiberCloud, Inc (Seattle, WA) [12.03 km]: 44.028 ms
Testing download speed........................................
Download: 32.29 Mbit/s
Testing upload speed..................................................
Upload: 5.18 Mbit/s
Connection Speed: ~ 800 Mbit/s
real 94m54.237s
user 12m26.960s
sys 28m46.648s
Note: We have only tested NGSeasy installation on Amazon EC2, Openstack and UK University Networks. These are all fairly fast networks with speeds exceeding 800 Mbit/s on average.
Important! NGSeasy is controlled from a single config
file. See ngseasy_test.config.tsv for a basic template. It is important that the user sets this up properly before running NGSeasy.
#############################################
## 0. Move to config file dir
cd /home/ec2-user/ngs_projects/config_files/
#############################################
## 1. Run basic test
ngseasy -c ngseasy_test.config.tsv -d /home/ec2-user/ngs_projects
This runs the following basic pipeline on Whole Exome PE 30x Illumina data, aligning to b37 (in theory...give it a try).
- FastQC > Trimmomatic > BWA > Platypus
- Edit NCPU in [ngseasy_test.config.tsv] to suit your system
- Edit PROJECT_DIR in [ngseasy_test.config.tsv] to suit your install path
- We expect the user to palce all raw fastq files in
raw_fastq
. NGSeasy uses this as a stagging area for new project and sample data. - right now, always run
ngseasy
from the location/directory that contains the config.file - each component of ngseasy can be run as a standalone script
We present NGSeasy (Easy Analysis of Next Generation Sequencing), a flexible and easy-to-use NGS pipeline for automated alignment, quality control, variant calling and annotation. The pipeline allows users with minimal computational/bioinformatic skills to set up and run an NGS analysis on their own samples, in less than an afternoon, on any operating system (Windows, iOS or Linux) or infrastructure (workstation, cluster or cloud).
NGS pipelines typically utilize a large and varied range of software components and incur a substantial configuration burden during deployment which limits their portability to different computational environments. NGSeasy simplifies this by providing the pipeline components encapsulated in Docker™ containers and bundles in a wide choice of tools for each module. Each module of the pipeline represents one functional grouping of tools (e.g. sequence alignment, variant calling etc.).
Deploying the pipeline is as simple as pulling the container images from the public repository into any host running Docker. NGSeasy can be deployed on any medium to high-end workstation, high performance computer cluster and compute clouds (public/private cloud computing) - enabling instant access to elastic scalability without investment overheads for additional compute hardware and makes open and reproducible research straight forward for the greater scientific community.
- Easy to use for non-informaticians.
- All run from a single config file that can be made in Excel.
- User can select from mutiple aligners, variant callers and variant annotators
- No scary python, .yaml or .json files...just one simple Excel workbook saved as a textfile.
- Just follow our simple set of instructions and NGS away!
- Choice of aligners and variant callers and anntators
- Allows reproducible research
- Version controlled for auditing
- Customisable
- Easy to add new tools
- If it's broke...we will fix it..
- Enforced naming convention and directory structures
- Allows users to run "Bake Offs" between tools with ease
We have adapted the current best practices from the Genome Analysis Toolkit (GATK, http://www.broadinstitute.org/gatk/guide/best-practices) for processing raw alignments in SAM/BAM format and variant calling. The current workflow, has been optimised for Illumina platforms, but can easily be adapted for other sequencing platforms, with minimal effort.
As the containers themselves can be run as executables with pre-specified cpu and RAM resources, the orchestration of the pipeline can be placed under the control of conventional load balancers if this mode is required.
Genome Build |
---|
hs37d5 |
b37 |
hg19 |
hs38DH |
The basic pipeline contains all the basic tools needed for manipulation and quality control of raw fastq files (ILLUMINA focused), SAM/BAM manipulation, alignment, cleaning (based on GATK best practises [http://www.broadinstitute.org/gatk/guide/best-practices]) and first pass variant discovery. Separate containers are provided for indepth variant annotation, structural variant calling, basic reporting and visualisations.
We include the following - what we think of as - NGS Powertools in the compbio/ngseasy-base image. These are all tools that allow the user to slice and dice BED/SAM/BAM/VCF files in multiple ways.
- samtools
- bcftools
- vcftools
- vcflib
- bamUtil
- bedtools2
- ogap
- samblaster
- sambamba
- bamleftalign
- seqtk
- parallel
This image is used as the base of all our compbio/ngseasy-* tools.
Why not a separate containers per application? The more docker-esque approach, would be to have separate containers for each NGS tool. However, this belies the fact that many of these tools interact in a deep way. Therefore, we built these into a single development environment for ngseasy, to allow pipes and streamlined system calls for manipulating the output of NGS pipelines (BED/SAM/BAM/VCF files).
The NGSeasy pipelines implement the following :-
-
Quality control of raw fastq files using FASTQC
-
Read trimming using TRIMMOMATIC.
-
Alignment using one of
-
SAM/BAM sorting and indexing with SAMBAMBA.
-
Read Group information added using PICARDTOOLS:AddOrReplaceReadGroups
-
Duplicate marking with SAMBLASTER.
For academic users and/or commercial/clinical groups whom have paid for GATK licensing, the next steps are to perform
- Indel indel realignment and base quality score recalibration using GATK built in tools :
For the non-GATK version
-
Base quality score recalibration using BamUtil
-
Post alignment quality control and reporting is performed usng a number of tools and custom scripts:
-
SNP and small INDEL calling using one of the following or a combibation of these tools, if the
ensemble
method is called using bcbio.variation variant-ensemble -
Structural Variant (CNV) calling using one of the following or or a combibation of if the
ensemble
methods are called:- -
Variant annotation using using one of the following or a combibation of if the
ensemble
methods are called. -
Variant reporting using custom scripts
Note Some of the later functions i.e. variant annotation and qc reporting are still in dev.
We highly recommed read trimming prior to alignment. We have noticed considerable speed-ups in alignmnet time and increased quality of SNP/INDEL calls using trimmed vs raw fastq.
Base quality score recalibration is also recommended.
As an alternative to GATK, we have added fucntionality for use of
BamUtil:recab
for base quality score recalibration.
Non-GATK users
- are encouraged to use aligners such as stampy and novoalign that perform base quality score recal on the fly.
- are encouraged to use variant callers that perform local re-aligmnet around candidate sites to mitigate the need for the indel realignment stages.
All NGSeasy Docker images can be pulled down from compbio Docker Hub or using the Makefile.
We provide an Amazon EBS data volume with indexed genomes: XXXXXX
Docker Image | Version | NGS Tool (version) | Short Description | URL |
---|---|---|---|---|
compbio/ngseasy-base | 1.0-r001 | VCFtools (v0.1.12b) | manipulate vcf | link |
- | - | vt (latest) | manipulate vcf | link |
- | - | bcftools (1.2-5-g7fa0d25) | manipulate vcf | link |
- | - | vcflib (v1.0.0) | manipulate vcf | link |
- | - | samtools (1.2-17-ge91985a) | manipulate sam/bam | link |
- | - | samblaster (0.1.21) | manipulate sam/bam | link |
- | - | sambamba (v0.5.1) | manipulate sam/bam | link |
- | - | bamUtil (1.0.13) | manipulate sam/bam | link |
- | - | bedtools (v2.23.0-10-g447cb97) | manipulate bed files | link |
- | - | seqtk (1.0-r77-dirty) | manipulate fastq | link |
- | - | vawk (0.0.2) | manipulate vcf | link |
- | - | bioawk (latest) | manipulate sam/bam/vcf | link |
compbio/ngseasy-fastqc | 1.0-r001 | fastqc (v0.11.2) | FASTQ Quality Control Plots | link |
compbio/ngseasy-trimmomatic | 1.0-r001 | trimmomatic (0.32) | FASTQ Quality Trimming | link |
compbio/ngseasy-bwa | 1.0-r001 | bwa ( 0.7.12-r1039) | Aligner | link |
compbio/ngseasy-stampy | 1.0-r001 | stampy (stampy-1.0.27) | Aligner | link |
compbio/ngseasy-snap | 1.0-r001 | snap-aligner (1.0beta.18) | Aligner | link |
compbio/ngseasy-bowtie2 | 1.0-r001 | bowtie2 (2.2.4) | Aligner | link |
compbio/ngseasy-novoalign | 1.0-r001 | novoalign (3.02.13) | Aligner | link |
compbio/ngseasy-gatk | 1.0-r001 | gatk (3.4-0) | NGS PowerTools | link |
compbio/ngseasy-picardtools | 1.0-r001 | picardtools (1.128) | NGS PowerTools | link |
compbio/ngseasy-glia | 1.0-r001 | glia (latest) | NGS local realignment | link |
compbio/ngseasy-platypus | 1.0-r001 | platypus (0.8.1) | Variant Caller | link |
compbio/ngseasy-freebayes | 1.0-r001 | freebayes (v0.9.21-19-gc003c1e) | Variant Caller | link |
- ABRA
Run as non-root user pipeman
.
-v /media/Data:/home/pipeman
: Mounts local directory /media/Data
to container directory /home/pipeman
TOOL="bwa"
docker run \
-P \
-w /home/pipeman \
-e HOME=/home/pipeman \
-e USER=pipeman \
--user pipeman \
-v /media/Data:/home/pipeman \
-it compbio/ngseasy-${TOOL}:1.0 /bin/bash
The following section describes getting the Dockerised NGSeasy Pipeline(s) and Resources, project set up and running NGSeasy.
Getting all resources and building required tools will take a few hours depending on network connections and any random "ghosts in the machine" - half a day in reality. But once you're set up, thats it - you are good to go.
See Table System Requirements for our recommended system requirements.NGSeasy will run on any modern computer/workstation or cloud infrastructure. The Hard Disk requirements are based on our experience and result from the fact that the pipeline/tools produce a range of intermediary and temporary files for each sample.
The full NGSeasy install includes indexed genomes for hg19 and b37 for all aligners, annotation files from GATK resource, and all of the NGSeasy docker images. Additional disk space is needed if the user wishes to install the databases associated with the variant annotators, Annovar, VEP and snpEff.
Based on our experience, a functional basic NGS compute system for a small lab, would consist of at least 4TB disk space, 60GB RAM and at least 32 CPU cores. Internet speed and network connectivity are a major bottle neck when dealing with NGS sized data, and groups are encouraged to think about these issues before embarking on multi sample or population level studies - where compute requirements can very quickly escalate.
System Requirements
Component | Minimum | Recommended |
---|---|---|
RAM | 16GB | 48-60GB |
CPU | 8 cores | 16-36 cores |
Hard Disk (per sample) | 50-100GB | 200-500GB |
NGSeasy Install | 200GB | 500GB |
Annotation Databases | 500GB | >1TB |
Follow the simple instructions in the links provided below
A full set of instructions for multiple operating systems are available on the Docker website.
We provide a simple Makefile to pull all of the public nsgeasy components, scripts and set up to correct project directory structre on your local machines.
Setting up the initial project can take up a day, depending on your local network connections and speeds.
The default install dir is the users ${HOME} directory. The Makefile provides options to install to any user defined directory and select NGSeasy version. eg :-
## EG. Installing to /media/scratch
make INSTALLDIR="/media/scratch" VERSION="1.0" all
The Makefile also allows installation of selected components (check out its insides!).
Using Excel or something, make a [config.file.tsv] file and save as [TAB] a Delimited file with .tsv
extenstion.
This sets up Information related to: Project Name, Sample Name, Library Type, Pipeline to call, NCPU.
We provide a template that can be used with NGSeasy, see ngseasy_test.config.tsv.
The [config.file.tsv] should contain the following 23 columns for each sample to be run through a pipeline:-
Variable | type | Description | Options(Examples) |
---|---|---|---|
PROJECT_ID | STRING | Project ID | Cancer |
SAMPLE_ID | STRING | Sample ID | SAMPLE_I |
FASTQ1 | STRING | Read 1 Fastq | foo_R1.fq.gz |
FASTQ2 | STRING | Read 2 Fastq | foo_R2.fq.gz |
PROJECT_DIR | STRING | ngseasy project dir | /media/scratch/ngs_projects |
DNA_PREP_LIBRARY_ID | STRING | NGS Library | |
NGS_PLATFORM | STRING | NGS Platform | ILLUMINA |
NGS_TYPE | STRING | NGS Type | WEX (exome), WGS (genome), TGS (targeted) |
BAIT | STRING | bait bed file | FOO.bed |
CAPTURE | STRING | Capture bed file | BAR.bed |
GENOMEBUILD | STRING | genome verison | hg19, b37 , b38 (coming soon) |
FASTQC | STRING | Select fastqc | no-fastqc, qc-fastqc |
TRIM | STRING | Select trimming | no-trimm, atrimm, btrimm |
BSQR | STRING | Select BSQR | no-bsqr, bam-bsqr, gatk-bsqr |
REALN | STRING | Select Realignment | no-realn, bam-realn, gatk-realn |
ALIGNER | STRING | Select Aligner | no-aln, bwa, stampy, snap, novoalign, bowtie2 |
VARCALLER | STRING | Select Variant Caller | no-varcall, freebayes, platypus, UnifiedGenotyper, HaplotypeCaller, ensemble |
CNV | STRING | Select CNV caller | no-sv,all-sv,lumpy,delly,slope,exomedepth,mhmm,cnvnator |
ANNOTATOR | STRING | Select variant annotator | no-anno,snpeff,annovar,vep |
CLEANUP | STRING | clean up temp files | TRUE, FALSE |
NCPU | NUMBER | number of cores | 1 .. N |
VERSION | NUMBER | NGSeasy version | 1.0 |
NGSUSER | STRING | user email | stephen.j.newhouse@gmail.com |
The user needs to make the relevent directory structures on their local machine before starting an NGS run.
On our sysetm we typically set up a top-level driectory called ngs_projects within which we store output from all our individual NGS projects.
Here we are working from local top level directory called media/, but this can really be any folder on your local system ie your home directory ~/${USER}.
Within this directory media we make the following folders: -
ngs_projects
|
|__raw_fastq
|__config_files
|__ngseasy_resources
|
|__reference_genomes_b37
|__reference_genomes_hg19
Running the script make XXXX
ensures that all relevant directories are set up, and also enforces a clean structure to the NGS project.
Within this we make a raw_fastq
folder, where we temporarily store all the raw fastq files for each project. This folder acts as an initial stagging area for the raw fastq files. During the project set up, we copy/move project/sample related fastq files to their own specific directories.
Fastq files must have suffix and be gzipped: _1.fq.gz or _2.fq.gz
furture version will allow any format
Running ngseasy
with the relevent configuration file, will set up the following directory structure for every project and sample within a project:-
.
ngs_projects
|
|__raw_fastq
|__config_files
|__run_logs
|__ngseasy_resources
|
|__ project_id
|
|__run_logs
|__config_files
|
|__sample_id_1
| |
| |__fastq
| |__tmp
| |__alignments
| |__vcf
| |__reports
| |__config_files
|
|
|__sample_id_n
|
|__fastq
|__tmp
|__alignments
|__vcf
|__reports
|__config_files
The raw_fastq
Directory is a very special directory indeed.
This is where the user should copy and or move ALL NEW RAW FASTQ Files to.
This is to be used as an intial staging area for all fastq files.
NGSeasy expects all raw fastq data to be placed here for all new samples or runs.
NGSeasy inspects this folder and looks for the fastq file names specified in your confifg file.
If NGSeasy doen't find them, then it exits.
We do this to force the user to get organised.
Work In Progress...
Currently we are not able to automatically build some of the tools in pre-built docker containers due to licensing restrictions.
Some of the software has restrictions on use particularly for commercial purposes. Therefore if you wish to use this for commercial purposes, then you leagally have to approach the owners of the various components yourself!
Software composing the pipeline requiring registration:-
- novoalign http://www.novocraft.com/
- GATK https://www.broadinstitute.org/gatk/
- ANNOVAR http://www.openbioinformatics.org/annovar/
These tools require manual download and registration with the proivder. For non-academics/commercial groups, you will need to pay for some of these tools.
Once you have paid/registered and downloaded the tool, we provide scripts and guidance for building these tools on your system.
Its as easy as:-
docker build -t compbio/ngseasy-${TOOL} .
Download Novoalign from http://www.novocraft.com/ into the local build directory *ngseasy/containerized/ngs_docker_debian/ngs_aligners/ngseasy_novoalign. Edit the Dockerfile to relfect the correct version of novoalign.
To use all novoalign fucntionality, you will need to pay for a license.
Once you obtained your novoalign.lic, download this to the build directory *ngseasy/containerized/ngs_docker_debian/ngs_aligners/ngseasy_novoalign, which now should contain your updated Dockerfile.
# move to ngseasy_stampy folder
cd ngseasy/containerized/ngs_docker_debian/ngs_aligners/ngseasy_novoalign
ls
the directory should contain the following:-
Dockerfile
novoalign.lic
README.md
novosortV1.03.01.Linux3.0.tar.gz
novocraftV3.02.08.Linux3.0.tar.gz
build novoalign
# build
docker build -t compbio/ngseasy-novoalign:v1.0 .
You need to register and accept the GATK license agreement at https://www.broadinstitute.org/gatk/.
Once done, download GATK and place in the GTAK build directory ngseasy/containerized/ngs_docker_debian/ngs_utils/ngseasy_gatk.
Edit the Dockerfile to relfect the correct version of GATK.
# move to ngseasy_gatk folder
cd ngseasy/containerized/ngs_docker_debian/ngs_utils/ngseasy_gatk
ls
the directory should contain the following:-
Dockerfile
README.md
GenomeAnalysisTK-3.3-0.tar.bz2
build gatk
# build
docker build -t compbio/ngseasy-gatk:v1.0 .
The tools used for variant annotation use large databases and the docker images exceed 10GB. Therefore, the user should manually build these container images prior to running the NGS pipelines. Docker build files (Dockerfile) are available for
Note Annovar requires user registration.
Once built on the user system, these container images can persist for as long as the user wants.
Large Variant Annotation Container Images
Its as easy as:-
docker build -t compbio/ngseasy-${TOOL} .
cd /media/ngs_projects/nsgeasy/ngs/containerized/ngs_docker_debian/ngseasy_vep
sudo docker build -t compbio/ngseasy-vep:${VERSION} .
cd /media/ngs_projects/nsgeasy/ngs/containerized/ngs_docker_debian/ngseasy_annovar
sudo docker build -t compbio/ngseasy-annovar:${VERSION} .
cd /media/ngs_projects/nsgeasy/ngs/containerized/ngs_docker_debian/ngseasy_snpeff
sudo docker build -t compbio/ngseasy-snpeff:${VERSION} .
- New Aligners:- SNAP, GSNAP, mr- and mrs-Fast,gem
- https://github.com/amplab/snap
- [SLOPE (CNV fo targetted NSG)] ((http://www.biomedcentral.com/1471-2164/12/184))
- Cancer Pipelines
- Annotation Pipelines and Databases
- Visualisation Pipelines
- Var Callers:- VarScan2
- SGE scripts and basic BASH scrips for running outside of Docker
- biobambam https://github.com/gt1/biobambam
- bamaddrg https://github.com/ekg/bamaddrg
- bamtools https://github.com/ekg/bamtools
- https://bcbio.wordpress.com/
- https://basecallbio.wordpress.com/2013/04/23/base-quality-score-rebinning/
- https://github.com/statgen/bamUtil
- http://genome.sph.umich.edu/wiki/BamUtil:_recab
- https://github.com/chapmanb/bcbio.variation
- http://plagnol-lab.blogspot.co.uk/2013/11/faq-and-clarifications-for-exomedepth-r.html
(C) 2015 Stephen J Newhouse & Amos Folarin