docker-4dn-repliseq

This repo contains the source files for a docker image stored in the docker hub container vera/docker-4dn-repliseq

what

This repository contains a dockerfile and scripts in order to execute generate replication timing profiles from a set of raw reads from sequencing of either early- and late-replicating DNA, or from DNA extracted from cells sorted for S or G1 DNA content.

Sample data files that can be used for testing the tools are included in the sample_data folder.

The scripts for executing the pipeline are under the scripts directory and follow naming conventions run_xx.sh. These wrappers are copied to the docker image at build time and may be used as a single step in a workflow.

A docker image for executing these scripts can be built yourself or pulled from docker hub (vera/docker-4dn-repliseq). Images built with the dockerfile will contain both the scripts and sample data for running/testing the pipeline.

how

example usage

# execute a step on data in the current directory
docker run -u $UID -w $PWD -v $PWD:$PWD:rw vera/docker-4dn-repliseq <name_of_script> <args>

step-by-step workflow

setup

# pull the pre-built image, create and enter a container inside the directory with your data
docker run --rm -it -h d4r -u $UID -w $PWD -v $PWD:$PWD:rw vera/docker-4dn-repliseq

# define number of CPU threads to use for the pipeline
export NUMTHREADS=8

define your input files

# download example data
wget -cbre robots=off -np -nH --cut-dirs=3 -A 'g*' http://www.bio.fsu.edu/~dvera/share/repliseq/

# define early and late fastq files, here using sample data
E=$(ls *early*.fq.gz)
L=$(ls *late*.fq.gz)

execute workflow step by step

# clip adapters from reads
cfq=$(clip $E $L)

# align reads to genome
bam=$(align -i $index $cfq)
bstat=$(samstats $bam)

# filter bams by alignment quality and sort by position
sbam=$(filtersort $bam)
fbstat=$(samstats $sbam)

# remove duplicate reads
rbam=$(dedup $sbam)

# calculate RPKM bedGraphs for each set of alignments
bg=$(count $rbam)

# filter windows with a low average RPKM
fbg=$(filter $bg)

# calculate log2 ratios between early and late
l2r=$(log2ratio $fbg)

# quantile-normalize replication timing profiles to the example reference bedGraph
l2rn=$(normalize $l2r)

# loess-smooth profiles using a 300kb span size
l2rs=$(smooth 300000 $NTHREADS $l2rn)

or use pipes

clip $E $L | align -i $index | filtersort | dedup | count | filter | log2ratio | normalize

Name		Name	Last commit message	Last commit date
Latest commit History 282 Commits
sample_data		sample_data
scripts		scripts
Dockerfile		Dockerfile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

docker-4dn-repliseq

what

how

example usage

step-by-step workflow

setup

define your input files

execute workflow step by step

or use pipes

About

Uh oh!

Releases

Packages

Languages

FSUgenomics/docker-4dn-repliseq

Folders and files

Latest commit

History

Repository files navigation

docker-4dn-repliseq

what

how

example usage

step-by-step workflow

setup

define your input files

execute workflow step by step

or use pipes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages