Releases: ENCODE-DCC/chip-seq-pipeline2
v2.0.0
Upgrade Caper to the latest >=2.0.0. Old versions of Caper won't work correctly on HPCs.
$ pip install caper --upgrade
$ caper -v # check if >=2.0.0
Conda users must re-install pipeline's Conda environments. YOU DO NOT NEED TO ACTIVATE CONDA ENVIRONMENT BEFORE RUNNING A PIPELINE. New Caper internally runs each task inside an installed Conda environment.
$ bash scripts/uninstall_conda_env.sh
$ bash scripts/install_conda_env.sh
HPC USERS MUST SPECIFY AN ENVIRONMENT TO RUN A PIPELINE ON. Choices are --conda
, --singularity
and --docker
. This pipeline defaults to run with --docker
so it will not work on HPCs without caper run ... --conda
or caper run ... --singularity
. It's recommended to use Singularity if your cluster supports it.
Please read new Caper (>=2.0.0)'s README carefully. There are very important updates on Caper's side for better HPC (Conda/Singularity/SLURM/...) support.
v1.9.0
Conda users must update pipeline's environment
$ bash scripts/update_conda_env.sh
Added a new parameter to fix random seed for pseudoreplication.
- This parameter controls random seed for shuffling reads in a TAG-ALIGN during pseudoreplication.
- GNU
shuf --random-source=some_hash_function(seed)
.
- GNU
chip.pseudoreplication_random_seed
: Any positive integer is allowed.- If
0
(default) then input TAG-ALIGN's file size (in bytes) is used for the random seed.
v1.8.1
v1.8.0
Conda users must update pipeline's environment.
$ bash scripts/update_conda_env.sh
Added input parameters:
chip.bowtie2_use_local_mode
- If this flag is on then the pipeline will add
--local
tobowtie2
command line, which will override the default--end-to-end
mode ofbowtie2
. - See details in this bowtie2 manual.
- If this flag is on then the pipeline will add
chip.bwa_mem_read_len_limit
- This parameter is only valid if
chip.use_bwa_mem_for_pe
and FASTQs are paired-ended. - This parameter defaults to
70
(as mentioned in bwa's manual). - This parameter controls the threshold for read length of
bwa mem
for paired ended dataset. The pipeline automatically determines sample's read length from a (merged) FASTQ R1 file. If such read length is shorter than this threshold then pipeline automatically switches back tobwa aln
instead ofbwa mem
. If you FASTQ's read length is <70
and you still want to usebwa mem
then try to reduce this parameter. - See details in this bwa manual.
- This parameter is only valid if
Conda environment
- Added and fixed version of
tbb
in the environment, which will possibly fix thebowtie2
andmamba
conflicting library issue.
v1.7.1
Conda users must re-install Conda environment.
$ scripts/uninstall_conda_env.sh
$ scripts/install_conda_env.sh mamba
mamba
support for Conda environment installation
- Add
mamba
to the installer command line to speed up resolving conflicts. - If it doesn't work then try without
mamba
. mamba
will be helpful for resolving conflicts of Conda packages much faster.
Increased resource factors
- Increased factors for some heavy tasks (
spr
,filter
,subsample_ctl
andmacs2_signal_track
). - Increased fixed disk size for several tasks (
gc_bias
).
Others
- Added
version
tometa
.
v1.7.0
Conda users must update their environment.
$ bash scripts/update_conda.env.sh
Added chip.redact_nodup_bam
- This will redact filtered/nodup BAMs by replacing indels with reference sequences to protect donor's private information.
Added chip.trimmomatic_phred_score_format
- Choices: [
auto
(default),phred33
,phred66
] (no hyphen). - Users can activate Trimmomatic's flag
-phred33
or-phred66
by defining this parameter asphred33
orphred66
. - Defaults to
auto
(using Trimmomatic's auto detection). - More details at [this doc] (http://www.usadellab.org/cms/uploads/supplementary/Trimmomatic/TrimmomaticManual_V0.32.pdf)
Removed caper
and croo
from pipeline's Conda environment.
- There has been some conflicts between
conda-forge
andbioconda
packages. These two apps will be added back to the environment later after all conflicts are fixed.
v1.6.1
Conda users should re-install pipeline's environment.
$ bash scripts/uninstall_conda_env.sh
$ bash scripts/install_conda_env.sh
Bug fixes
- Dependencies
- py2 Conda environment
- Fixed
biopython
at 1.76 which is the last version that supports py2.
- Fixed
- py3 Conda environment
- Added Caper's python dependency
scikit-learn
to it.
- Added Caper's python dependency
- py2 Conda environment
- Malformed required memory for
samtools sort
command line.
starch
support
- Generate
starch
output for blacklist-filtered peaks. (.starch
) - New Croo output definition JSON (v5) for
starch
es.
v1.6.0
Conda users should update pipeline's environment. However, reinstalling is always recommended since we added GNU utils to the installer.
# To update env
$ bash scripts/update_conda_env.sh
# To re-install env
$ bash scripts/uninstall_conda_env.sh
$ bash scripts/install_conda_env.sh
New factor-based resource parameters
- New parameters are factor-based and those factors are multiplied to task's input file sizes to determine required resources (mem/disk) to run a task (on a cloud instance or as an HCP job).
- e.g. for each replicate, sum of all R1/R2 FASTQs size will be used to determine resource for task
align
and BAM size will be used for taskfilter
. - e.g. if you have total
20 GB
(R1 + R2) of PE FASTQs and defaultchip.align_mem_factor
is0.15
and base memory is fixed at4-6 GB
for most tasks (5 GB
for taskalign
). So instance's memory for taskalign
will be20 * 0.15 + 5 = 8 GB
- Also, optimized memory/disk requirements for each task, all tasks should use less memory/disk than previous versions.
- Use SSD for all tasks on Google Cloud. This will cost x4 than HDD but it's still negligible (cost for SSD 100 GB is $0.5 per hour).
Change of default for resource parameters
chip.align_cpu
: 2 -> 6chip.filter_cpu
: 2 -> 4chip.call_peak_cpu
: 1 -> 2 (peak-caller MACS2 is single-threaded. No more than 2 is required)
Added resource parameters
chip.spr_disk_factor
chip.preseq_disk_factor
chip.call_peak_cpu
Change of resource parameters.
chip.align_mem_mb
->chip.align_bowtie2_mem_factor
andchip.align_bwa_mem_factor
- According to chosen aligner
chip.aligner
(bowtie2
orbwa
), For custom aligner, it will usechip.align_bwa_mem_factor
.
- According to chosen aligner
chip.align_disks
->chip.align_bowtie2_disk_factor
andchip.align_bwa_disk_factor
- According to chosen aligner
chip.aligner
(bowtie2
orbwa
), For custom aligner, it will usechip.align_bwa_disk_factor
.
- According to chosen aligner
chip.filter_mem_mb
->chip.filter_mem_factor
chip.filter_disks
->chip.filter_disk_factor
chip.bam2ta_mem_mb
->chip.bam2ta_mem_factor
chip.bam2ta_disks
->chip.bam2ta_disk_factor
chip.xcor_mem_mb
->chip.xcor_mem_factor
chip.xcor_disks
->chip.xcor_disk_factor
chip.spr_mem_mb
->chip.spr_mem_factor
chip.spr_disks
->chip.spr_disk_factor
chip.jsd_mem_mb
->chip.jsd_mem_factor
chip.jsd_disks
->chip.jsd_disk_factor
chip.call_peak_mem_mb
->chip.call_peak_spp_mem_factor
andchip.call_peak_macs2_mem_factor
- According to chosen peak caller
chip.peak_caller
(defaulting tospp
for TF ChIP andmacs2
for histone ChIP).
- According to chosen peak caller
chip.call_peak_disks
->chip.call_peak_spp_disk_factor
andchip.call_peak_macs2_disk_factor
- According to chosen peak caller
chip.peak_caller
(defaulting tospp
for TF ChIP andmacs2
for histone ChIP).
- According to chosen peak caller
chip.macs2_signal_track_mem_mb
->chip.macs2_signal_track_mem_factor
chip.macs2_signal_track_disks
->chip.macs2_signal_track_disk_factor
Resources for task align
- Custom aligner python script must be updated with
--mem-gb
.- Task
align
will use BWA's resources (chip.align_bwa_mem_factor
andchip.align_bwa_disk_factor
). --mem-gb
should be added to your Python scriptchip.custom_align_py
.- See input documentation for details.
- Task
Resources for task call_peak
- Different factor-based parameters will be used for different peak caller
chip.peak_caller
(defaulting tospp
for TF ChIP andmacs2
for histone ChIP). - If
chip.peak_caller
is not defined then TF ChIP-seq ("chip.pipeline_type": "tf"
) will default to usespp
peak caller, hencechip.call_peak_spp_mem_factor
andchip.call_peak_spp_disk_factor
). - If
chip.peak_caller
is not defined then histone ChIP-seq ("chip.pipeline_type": "histone"
) will default to usemacs2
peak caller, hencechip.call_peak_macs2_mem_factor
andchip.call_peak_macs2_disk_factor
).
Misc.
- Better multi-threading
samtools view/index/sort
. - Added GNU utils to Conda environment.
Zenodo integration for citation purposes
Integration with zenodo to generate a doi and citation that will update automatically with each subsequent release.
v1.5.1
New resource parameter for control subsampling.
- Control subsampling is separated from two peak-calling-related tasks (
call_peak
andmacs2_signal_track
) to prevent allocating high resource for subsampling, which is not fully utilized for peak-calling. - There is a new task for control subsampling, whose max. memory is controlled by
chip.subsample_ctl_mem_mb
.- It's
16000
by default. - Use higher number for huge controls. e.g.
32000
or64000
.
- It's
Bug fixes
- Typo in documentation about parameter
chip.mapq_thresh
. - Syntax error in WDL's
meta
section, which is not caught by Womtool but caught byminiwdl
.