v1.4.0.1
IMPORTANT: Update Caper >= 0.8.
$ pip install caper --upgrade
IMPORTANT: Conda users must update pipeline's Conda env.
$ bash scripts/update_conda_env.sh
New control subsampling
- Controlled by
chip.ctl_depth_limit
andchip.exp_ctl_depth_ratio_limit
. There are two limits calculated from each parameter. Pipeline takes a maximum of itmax(ctl_depth_limit, exp_ctl_depth_ratio_limit * exp_rep_read_depth)
and if control is deeper than that then control is subsampled to that limit. chip.ctl_depth_limit
: Hard limit on control's read depth. 200M by default.chip.exp_ctl_depth_ratio_limit
: Factor to be multiplied to experiment replicate's read depth.5.0
by default.- We still keep control subsampling controlled by a parameter
chip.ctl_subsample_reads
.- Both raw/filtered control BAMs have full reads. Filtered (nodup) control BAM is converted into control TAG-ALIGN and then TAG-ALIGN is subsampled down to
chip.ctl_subsample_reads
(if it is defined>0
). This parameter modifies TA itself so affects all downstream analyses like peak-calling and also the new automatic control subsampling, which is done in taskcall_peak
.
- Both raw/filtered control BAMs have full reads. Filtered (nodup) control BAM is converted into control TAG-ALIGN and then TAG-ALIGN is subsampled down to
Cropping FASTQs: Added a parameter chip.crop_length_tol
, which defines a tolerance to allow shorter reads around the crop_length. It's 2
by default and only works when chip.crop_length
is defined (>0
). Trimmomatic's parameters CROP
and MINLEN
will be chip.crop_length
and chip.crop_length
- abs(chip.crop_length_tol)
, respectively. Output (cropped FASTQ) filename will be PREFIX.crop_${CROP}-${TOELRANCE}bp.fastq.gz
where TOLERANCE = CROP - MINLEN
.
- All reads longer (>) than
chip.crop_length
will be cropped. - All reads shorter (<) then
chip.crop_length - abs(chip.crop_length_tol)
will be removed. - All reads not shorter (>=) then
chip.crop_length - abs(chip.crop_length_tol)
and not longer (<=) thanchip.crop_length
will be kept.
Java heap
- For tasks with Java app running inside. If the following parameters are not explicitly defined by a user, each Java app in a task uses 90% of corresponding task memory, so that it does not go over physical memory of cloud instance. For example, if user didn't define
chip.filter_picard_java_heap
and then pipeline will use 90% ofchip.filter_mem_mb
for Java heap-Xmx
(for picard tools in filter task).chip.align_trimmomatic_java_heap
chip.filter_picard_java_heap
chip.gc_bias_picard_java_heap
Bug fixes
- Subsampling TAG-ALIGN (for PE dataset only)
- PE subsampling task actually subsampled 2 x
chip.subsample_reads
reads.
- PE subsampling task actually subsampled 2 x
- Default settings of the pipeline is not affected by this bug.
- Affected cases:
chip.subsample_reads > 0
(0 by default) andchip.paired_end == True
and actual number of reads in replicate is >chip.subsample_reads
.chip.ctl_subsample_reads > 0
(0 by default) andchip.ctl_paired_end == True
and actual number of reads in control is >chip.ctl_subsample_reads
.- If users starts from types (e.g. BAM, NODUP-BAM, TA) other than FASTQ and
chip.paired_end == True
and actual number of reads in replicate is >chip.xcor_subsample_reads
(15M by default).
- Affected cases:
- Fix
grep
error on OSX. - Swapped lines in
chip.croo.v4.json
. - Cannot start from BAMs on DNAnexus (using Web UI).
- JSD didn't work without a blacklist.
- Pooled TAG-ALIGN had a fixed prefix "basename_prefix".
- Croo task graph got complicated due to diamond dependency problem of task
choose_ctl
.