This repo contains the source files for a docker image stored in the docker hub container vera/docker-4dn-repliseq
This repository contains a dockerfile and scripts in order to execute generate replication timing profiles from a set of raw reads from sequencing of either early- and late-replicating DNA, or from DNA extracted from cells sorted for S or G1 DNA content.
Sample data files that can be used for testing the tools are included in the sample_data
folder.
The scripts for executing the pipeline are under the scripts
directory and follow naming conventions run_xx.sh
. These wrappers are copied to the docker image at build time and may be used as a single step in a workflow.
A docker image for executing these scripts can be built yourself or pulled from docker hub (vera/docker-4dn-repliseq). Images built with the dockerfile will contain both the scripts and sample data for running/testing the pipeline.
# execute a step on data in the current directory
docker run -u $UID -w $PWD -v $PWD:$PWD:rw vera/docker-4dn-repliseq <name_of_script> <args>
# pull the pre-built image, create and enter a container inside the directory with your data
docker run --rm -it -h d4r -u $UID -w $PWD -v $PWD:$PWD:rw vera/docker-4dn-repliseq
# define number of CPU threads to use for the pipeline
export NUMTHREADS=8
# download example data
wget -cbre robots=off -np -nH --cut-dirs=3 -A 'g*' http://www.bio.fsu.edu/~dvera/share/repliseq/
# define early and late fastq files, here using sample data
E=$(ls *early*.fq.gz)
L=$(ls *late*.fq.gz)
# clip adapters from reads
cfq=$(clip $E $L)
# align reads to genome
bam=$(align -i $index $cfq)
bstat=$(samstats $bam)
# filter bams by alignment quality and sort by position
sbam=$(filtersort $bam)
fbstat=$(samstats $sbam)
# remove duplicate reads
rbam=$(dedup $sbam)
# calculate RPKM bedGraphs for each set of alignments
bg=$(count $rbam)
# filter windows with a low average RPKM
fbg=$(filter $bg)
# calculate log2 ratios between early and late
l2r=$(log2ratio $fbg)
# quantile-normalize replication timing profiles to the example reference bedGraph
l2rn=$(normalize $l2r)
# loess-smooth profiles using a 300kb span size
l2rs=$(smooth 300000 $NTHREADS $l2rn)
clip $E $L | align -i $index | filtersort | dedup | count | filter | log2ratio | normalize