Skip to content

v1.4.0.1

Compare
Choose a tag to compare
@leepc12 leepc12 released this 11 Apr 05:19
· 315 commits to master since this release
e2a698d

IMPORTANT: Update Caper >= 0.8.
$ pip install caper --upgrade

IMPORTANT: Conda users must update pipeline's Conda env.
$ bash scripts/update_conda_env.sh

New control subsampling

  • Controlled by chip.ctl_depth_limit and chip.exp_ctl_depth_ratio_limit. There are two limits calculated from each parameter. Pipeline takes a maximum of it max(ctl_depth_limit, exp_ctl_depth_ratio_limit * exp_rep_read_depth) and if control is deeper than that then control is subsampled to that limit.
  • chip.ctl_depth_limit: Hard limit on control's read depth. 200M by default.
  • chip.exp_ctl_depth_ratio_limit: Factor to be multiplied to experiment replicate's read depth. 5.0 by default.
  • We still keep control subsampling controlled by a parameter chip.ctl_subsample_reads.
    • Both raw/filtered control BAMs have full reads. Filtered (nodup) control BAM is converted into control TAG-ALIGN and then TAG-ALIGN is subsampled down to chip.ctl_subsample_reads (if it is defined >0). This parameter modifies TA itself so affects all downstream analyses like peak-calling and also the new automatic control subsampling, which is done in task call_peak.

Cropping FASTQs: Added a parameter chip.crop_length_tol, which defines a tolerance to allow shorter reads around the crop_length. It's 2 by default and only works when chip.crop_length is defined (>0). Trimmomatic's parameters CROP and MINLEN will be chip.crop_length and chip.crop_length - abs(chip.crop_length_tol), respectively. Output (cropped FASTQ) filename will be PREFIX.crop_${CROP}-${TOELRANCE}bp.fastq.gz where TOLERANCE = CROP - MINLEN.

  • All reads longer (>) than chip.crop_length will be cropped.
  • All reads shorter (<) then chip.crop_length - abs(chip.crop_length_tol) will be removed.
  • All reads not shorter (>=) then chip.crop_length - abs(chip.crop_length_tol) and not longer (<=) than chip.crop_length will be kept.

Java heap

  • For tasks with Java app running inside. If the following parameters are not explicitly defined by a user, each Java app in a task uses 90% of corresponding task memory, so that it does not go over physical memory of cloud instance. For example, if user didn't define chip.filter_picard_java_heap and then pipeline will use 90% of chip.filter_mem_mb for Java heap -Xmx (for picard tools in filter task).
    • chip.align_trimmomatic_java_heap
    • chip.filter_picard_java_heap
    • chip.gc_bias_picard_java_heap

Bug fixes

  • Subsampling TAG-ALIGN (for PE dataset only)
    • PE subsampling task actually subsampled 2 x chip.subsample_reads reads.
  • Default settings of the pipeline is not affected by this bug.
    • Affected cases:
      • chip.subsample_reads > 0 (0 by default) and chip.paired_end == True and actual number of reads in replicate is > chip.subsample_reads.
      • chip.ctl_subsample_reads > 0 (0 by default) and chip.ctl_paired_end == True and actual number of reads in control is > chip.ctl_subsample_reads.
      • If users starts from types (e.g. BAM, NODUP-BAM, TA) other than FASTQ and chip.paired_end == True and actual number of reads in replicate is > chip.xcor_subsample_reads (15M by default).
  • Fix grep error on OSX.
  • Swapped lines in chip.croo.v4.json.
  • Cannot start from BAMs on DNAnexus (using Web UI).
  • JSD didn't work without a blacklist.
  • Pooled TAG-ALIGN had a fixed prefix "basename_prefix".
  • Croo task graph got complicated due to diamond dependency problem of task choose_ctl.