Release v1.4.0.1 · ENCODE-DCC/chip-seq-pipeline2

IMPORTANT: Update Caper >= 0.8.
$ pip install caper --upgrade

IMPORTANT: Conda users must update pipeline's Conda env.
$ bash scripts/update_conda_env.sh

New control subsampling

Controlled by chip.ctl_depth_limit and chip.exp_ctl_depth_ratio_limit. There are two limits calculated from each parameter. Pipeline takes a maximum of it max(ctl_depth_limit, exp_ctl_depth_ratio_limit * exp_rep_read_depth) and if control is deeper than that then control is subsampled to that limit.
chip.ctl_depth_limit: Hard limit on control's read depth. 200M by default.
chip.exp_ctl_depth_ratio_limit: Factor to be multiplied to experiment replicate's read depth. 5.0 by default.
We still keep control subsampling controlled by a parameter chip.ctl_subsample_reads.
- Both raw/filtered control BAMs have full reads. Filtered (nodup) control BAM is converted into control TAG-ALIGN and then TAG-ALIGN is subsampled down to chip.ctl_subsample_reads (if it is defined >0). This parameter modifies TA itself so affects all downstream analyses like peak-calling and also the new automatic control subsampling, which is done in task call_peak.

Cropping FASTQs: Added a parameter chip.crop_length_tol, which defines a tolerance to allow shorter reads around the crop_length. It's 2 by default and only works when chip.crop_length is defined (>0). Trimmomatic's parameters CROP and MINLEN will be chip.crop_length and chip.crop_length - abs(chip.crop_length_tol), respectively. Output (cropped FASTQ) filename will be PREFIX.crop_${CROP}-${TOELRANCE}bp.fastq.gz where TOLERANCE = CROP - MINLEN.

All reads longer (>) than chip.crop_length will be cropped.
All reads shorter (<) then chip.crop_length - abs(chip.crop_length_tol) will be removed.
All reads not shorter (>=) then chip.crop_length - abs(chip.crop_length_tol) and not longer (<=) than chip.crop_length will be kept.

Java heap

For tasks with Java app running inside. If the following parameters are not explicitly defined by a user, each Java app in a task uses 90% of corresponding task memory, so that it does not go over physical memory of cloud instance. For example, if user didn't define chip.filter_picard_java_heap and then pipeline will use 90% of chip.filter_mem_mb for Java heap -Xmx (for picard tools in filter task).
- chip.align_trimmomatic_java_heap
- chip.filter_picard_java_heap
- chip.gc_bias_picard_java_heap

Bug fixes

Subsampling TAG-ALIGN (for PE dataset only)
- PE subsampling task actually subsampled 2 x chip.subsample_reads reads.
Default settings of the pipeline is not affected by this bug.
- Affected cases:
  - chip.subsample_reads > 0 (0 by default) and chip.paired_end == True and actual number of reads in replicate is > chip.subsample_reads.
  - chip.ctl_subsample_reads > 0 (0 by default) and chip.ctl_paired_end == True and actual number of reads in control is > chip.ctl_subsample_reads.
  - If users starts from types (e.g. BAM, NODUP-BAM, TA) other than FASTQ and chip.paired_end == True and actual number of reads in replicate is > chip.xcor_subsample_reads (15M by default).
Fix grep error on OSX.
Swapped lines in chip.croo.v4.json.
Cannot start from BAMs on DNAnexus (using Web UI).
JSD didn't work without a blacklist.
Pooled TAG-ALIGN had a fixed prefix "basename_prefix".
Croo task graph got complicated due to diamond dependency problem of task choose_ctl.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.4.0.1