In this minor release we make minor adjustments to memory reservations for serval modules, provide an explicit sorting statement for RSEM ensure memory limits are respected, and bump the version of multiQC to 1.25.2.
In this release we correct a nextflow issue in the GRIDSS_ASSEMBLY step used in the human PTA workflow.
In this release we make a updates to the ATAC workflow, and correct issues related to the PTA workflow.
ATAC:
- Merging of replicate samples is now supported. Use the
--merge_replicates
option, along with a CSV input file. See the wiki page for details on CSV setup. - GRCm39 pseudo-references generated with G2Gtools are now supported. Previously, GRCm38 was supported via the
--chain
option. For GRCm39, VCI files are required input also specified with--chain
PTA:
- The mouse PTA workflow would crash when all somatic CNVs were filtered, we have corrected this.
- Numerous adjustments to adjustments to memory and wall clock limits were made to support high coverage WGS data.
None
- modules/g2gtools/g2gtools_vci_convert.nf
- workflows/atac.nf: Replicate merging added. GRCm39 pseudo-reference support added.
- subworkflows/aria_download_parse.nf: Support for replicate merging added.
- subworkflows/concatenate_local_files.nf: Support for replicate merging added.
- modules/cosmic/cosmic_add_cancer_resistance_mutations_germline.nf: wallclock and memory request increase.
- modules/gridss/gridss_assemble.nf: memory request increase, and java heap adjustment.
- modules/gridss/gripss_somatic_filter.nf: memory request increase, and java heap adjustment.
- modules/illumina/manta.nf: memory and wallclock requests were made flat rather than scaled to input file size.
- modules/picard/picard_mergesamfiles.nf: correct
file
vs.path
nextflow issue. - modules/python/python_somatic_vcf_finalization.nf: wallclock requests increase.
- modules/python/python_somatic_vcf_finalization_mouse.nf: wallclock requests increase.
- modules/r/plot_delly_cnv.nf: add dynamic plot naming based on
sampleID
- modules/samtools/samtools_chain_sort_fixmate_bam.nf: alter module to re-sort final filtered BAM prior to possible replicate merge.
- modules/samtools/samtools_non_chain_reindex.nf: alter module to re-sort final filtered BAM prior to possible replicate merge.
- modules/samtools/samtools_stats_insertsize.nf: wallclock request increase.
- modules/svaba/svaba.nf: memory and wallclock requests increase.
None
- bin/gbrs/generate_emission_prob_avecs.py: Modify for use with non-DO strain IDs and dynamic number of strains.
- bin/pta/annotate-bedpe-with-cnv.r: Capture edge case where all somatic CNV are filtered.
- bin/pta/annotate-cnv-delly.r: Capture edge case where all somatic CNV are filtered.
- bin/pta/delly_cnv_plot.r: Capture edge case where all somatic CNV are filtered.
None
In this minor release we correct a bug in --workflow atac
. In this workflow, the macs2
module was configured to use a user defined parameter tmpdir
for scratch space. However, if the specified tmpdir
did not exist, macs2
would fail silently, and allow the workflow to continue. This behavior has been fixed.
In this minor release we change the Xengsort container to include GNU sort
rather than BusyBox sort
. This change was required to process very large FASTQ files.
In our testing, BusyBox sort
requires files to be held in memory during sorting, and does not support the use of temporary files. The use of GNU sort
allows for temporary files to be generated and alleviates the need to hold entire files in memory. This change has no impact on output from Xengsort, or any associated workflow.
In this release we add a new workflow for calling copy number variation (CNV) from raw Illumina IDAT genotype array files. Currently the Illumina IlluminaCytoSNP v2.1 array is supported, but support for additional arrays is possible.
We make additional minor changes as described below.
- CNV calling from Illumina genotype array data (--cnv_array)
- modules/bcftools/bcftools_gtct2vcf.nf
- modules/bcftools/bcftools_query_ascat.nf
- modules/illumina/iaap_cli.nf
- modules/ascat/ascat_run.nf
- modules/ascat/ascat_annotation.nf
None
- Replaced the incorrect
${task.mem}
with${task.memory}
in the Nextflow error catch statement in modules related to the SV calling workflows. - utility_modules/gzip.nf: Memory request increase
- cnv_array/ASCAT_run.R
- cnv_array/annotate_ensembl_genes.pl
- cnv_array/seg_plot.R
- cnv_array/segment_raw_extend.pl
None
- tests/workflows/cnv_array.nf.test
In this release we make the following minor adjustments:
- Correct syntax errors in the Xengsort module when running single-end data.
- Minor adjustments to EMASE and GBRS help and log information to include the
gen_org
param. - Bump the version of MultiQC to v1.23.
- Increase the memory request for a
PTA
moudles:python_merge_prep.nf
andpython_reorder_vcf_columns.nf
. - Add
CHECK_STRANDEDNESS
to multiQC output for PDX RNAseq - Increased job memory request in example run scripts.
In this release, we add a FASTQ sorting function to the Xengsort module. Due to asynchronous multi-threading in the classification step, Xengsort produces FASTQ output with non-deterministic sort order. BWA produces subtly different mapping results when reads in otherwise identical FASTQ inputs are shuffled (see note from BWA developer here). The slight mapping differences are not enough to impact overall results, but do prevent fully reproducible results when Xengsort is used and reads are not sorted. The addition of the sorting function allows for fully reproducible results, with no additional user action required.
In this minor release, we fix a subscript out of bounds
bug in bin/wes/sequenza_seg_na_window.R
.
In this release, we adjust memory and wallclock requirements for a number of modules, update read_group_from_fastq.py
from python2 to python3, and incorporate PRs #4 and #5.
- PR #4 (contributed by @BrianSanderson) adds an optional gene and transcript count merge across samples in the RNA and PDX RNA workflows (merge accessed via including the
--merge_rna_counts
flag). - PR #5 (contributed by @alanhoyle) adds a catch for corrupt gzip files in the Bowtie module as used by EMASE/GRBS analyses.
None
- utility_modules/merge_rsem_counts.nf
- workflows/rnaseq.nf module added to merge gene and transcript expression when
--merge_rna_counts
is used. - workflows/pdx_rnaseq.nf module added to merge gene and transcript expression when
--merge_rna_counts
is used.
- bowtie/bowtie.nf pipefail catch added for corrupt gzip files, per #5.
- fastp/fastp.nf save json report as well as html report.
- nygenome/lancet.nf wallclock request increase.
- picard/picard_markduplicates.nf memory adjustment, and accounting for MarkDuplicates not fully respecting -Xmx memory limits imposed by Java.
- picard/picard_reordersam.nf memory request increase.
- picard/picard_sortsam.nf memory request increase.
- utility_modules/read_groups.nf container changed to py3.
- bin/shared/read_group_from_fastq.py update from py2 to py3.
In this release we change the read disambiguation tool Xenome for Xengsort. Extensive benchmarking shows high concordance among results obtained from both tools.
Additionally, we correct an issue with the human PTA workflow when running the combination of the --pdx
and --split_fastq
options. Data run with this combination of options from version 0.6.0-0.6.2 should be re-run.
None
- xengsort/xengsort_classify.nf
- xengsort/xengsort_index.nf
- Xengsort replaces Xenome for all PDX based workflows (RNAseq, RNA fusion, Hs PTA, Somatic WES, Somatic WES PTA)
- Correction made for the Human PTA when running the combination of the
--pdx
and--split_fastq
options.
None
In this minor release we adjust memory and wall clock statements, and modified bin/pta/merge-caller-vcfs.r
to correct for an edge case related bug.
In this minor release we added support for automatic Zenodo releases via github actions. There are no changes or additions to workflows.
In this major release we add seven new workflows, and make numerous changes to existing workflows. Specific changes are discussed below.
For Jackson Laboratory users this release now supports the new Sumner2 cluster. To use workflows on sumner, simply specify: -profile sumner2
. Note that Sumner has reached end of life, and will no longer be supported going forward. We have updated all example run scripts to use -profile sumner2
.
Note Sumner2 enforces strict Linux cgroups, which holds jobs to the memory and cpu limits requested by each Nextflow module. In our release testing, we increased many memory reservation steps; however, additional memory issues are to be expected. If you encounter OOM
(out of memory) issues and experience workflow steps failing with killed
reported in the error log, please either email us: (ngsOps@jax.org) or submit an issue with details on which module failed and the size of the dataset you were running.
Related to memory and time restrictions, we made signficiant changes to the PTA, WGS, and WES workflows:
- For human PTA, WGS and WES analyses GATK BaseRecalibrator is now scattered by chromosome.
- For PTA and WGS, options were added to allow users to:
- Deduplicate reads with
Clumpify
prior to mapping steps. - Split FASTQ files into batched chunks for subsequent mapping. Mapped batches are merged prior to the GATK MarkDuplicates step.
- Cap coverage at a user defined threshold using JVARKIT Biostar154220 prior to variant calling. This can help reduce computational load when calling variants in higher coverage areas of the genome.
- Deduplicate reads with
- For PTA, WGS and WES FASTP is now used for read and adapter trimming.
We have included an option to specify the root location of the omics_share reference file set. For Jax users on Sumner2, this option should not be changed and defaults to /projects/omics_share
. For external users, or those on Elion, specify the root directory of reference files with --reference_cache </path/to/omics_share>
.
Finally, we added testing modules for all workflows to be used with (nf-test)[https://www.nf-test.com/].
- Amplicon Sequencing (supporting human only at this time): General PCR / Targeted Sequencing
- Genetic Ancestry (See https://www.biorxiv.org/content/10.1101/2022.10.24.513591v1 for details and methods)
- Germline Structural Variant Calling
- Illumina short-read data
- Pacific Biosciences (PacBio) long-read data: CCS and CLR modes
- Oxford Nanopore Technologies long-read data (ONT)
- Somatic Whole Exome Sequencing for tumor-only samples (with option for PDX)
- Somatic Whole Exome Sequencing for Paired Tumor Analysis (PTA; with option for PDX)
- modules/abra2/abra2.nf
- modules/bbmap/bbmap_clumpify.nf
- modules/bcftools/bcftools_annotate.nf
- modules/bcftools/bcftools_call.nf
- modules/bcftools/bcftools_duphold_filter.nf
- modules/bcftools/bcftools_filter.nf
- modules/bcftools/bcftools_merge_amplicon.nf
- modules/bcftools/bcftools_mpileup.nf
- modules/bcftools/bcftools_norm.nf
- modules/bcftools/bcftools_rehead_sort.nf
- modules/bcftools/bcftools_vcf_to_bcf.nf
- modules/bedops/bedops_sort.nf
- modules/bedops/bedops_window.nf
- modules/bedtools/bedtools_sequenza_subtract.nf
- modules/bwa/bwa_index.nf
- modules/bwa/bwa_mem2.nf
- modules/delly/delly_call.nf
- modules/delly/delly_call_germline.nf
- modules/delly/delly_cnv_germline.nf
- modules/duphold/duphold.nf
- modules/freebayes/freebayes.nf
- modules/gatk/gatk_baserecalibrator_interval.nf
- modules/gatk/gatk_calculatecontamination.nf
- modules/gatk/gatk_calculatecontamination_tumorOnly.nf
- modules/gatk/gatk_filtermutectcalls_wes.nf
- modules/gatk/gatk_gatherbqsrreports.nf
- modules/gatk/gatk_getpileupsummaries.nf
- modules/gatk/gatk_getpileupsummaries_tumorOnly.nf
- modules/gatk/gatk_haplotypecaller_amplicon.nf
- modules/gatk/gatk_learnreadorientationmodel.nf
- modules/gatk/gatk_mutect2_wes_pta.nf
- modules/gatk/gatk_printreads.nf
- modules/gatk/gatk_variantfiltration_freebayes.nf
- modules/illumina/manta_germline.nf
- modules/jvarkit/jvarkit_biostar154220.nf
- modules/lumpy/lumpy_call_sv.nf
- modules/lumpy/lumpy_extract_splits.nf
- modules/lumpy/lumpy_prep.nf
- modules/minimap/minimap2_index.nf
- modules/minimap/minimap2_map_ont.nf
- modules/nanofilt/nanofilt.nf
- modules/nanoqc/nanoqc.nf
- modules/nanostat/nanostat.nf
- modules/nanosv/nanosv.nf
- modules/pbmm2/pbmm2_call.nf
- modules/pbmm2/pbmm2_index.nf
- modules/pbsv/pbsv_call.nf
- modules/pbsv/pbsv_discover.nf
- modules/picard/picard_markduplicates_removedup.nf
- modules/picard/picard_sortsam_mmrsvd.nf
- modules/porechop/porechop.nf
- modules/python/python_add_AF_freebayes.nf
- modules/python/python_add_AF_haplotypecaller.nf
- modules/python/python_annot_depths.nf
- modules/python/python_annot_on_target.nf
- modules/python/python_bedpe_to_vcf.nf
- modules/python/python_parse_depths.nf
- modules/python/python_parse_survivor_ids.nf
- modules/r/illumina_sv_merge.nf
- modules/r/r_merge_depths.nf
- modules/samtools/samtools_cat.nf
- modules/samtools/samtools_filter_mmrsvd.nf
- modules/samtools/samtools_merge.nf
- modules/samtools/samtools_mpileup.nf
- modules/samtools/samtools_stats_mmrsvd.nf
- modules/scarhrd/scarhrd.nf
- modules/sequenza/sequenza_annotate.nf
- modules/sequenza/sequenza_na_window.nf
- modules/sequenza/sequenza_pileup2seqz.nf
- modules/sequenza/sequenza_run.nf
- modules/smoove/smoove_call_germline.nf
- modules/sniffles/sniffles.nf
- modules/snpweights/snpweights_inferanc.nf
- modules/snpweights/snpweights_vcf2eigenstrat.nf
- modules/survivor/survivor_annotation.nf
- modules/survivor/survivor_bed_intersect.nf
- modules/survivor/survivor_inexon.nf
- modules/survivor/survivor_merge.nf
- modules/survivor/survivor_to_bed.nf
- modules/survivor/survivor_vcf_to_table.nf
- modules/tumor_mutation_burden/tmb_score.nf
- modules/utility_modules/filter_trim.nf
- modules/vcftools/vcftools_filter.nf
- tests/workflows/amplicon_fingerprint.nf.test
- tests/workflows/amplicon_generic.nf.test
- tests/workflows/ancestry.nf.test
- tests/workflows/atac.nf.test
- tests/workflows/chipseq.nf.test
- tests/workflows/emase.nf.test
- tests/workflows/gbrs.nf.test
- tests/workflows/generate_pseudoreference.nf.test
- tests/workflows/prep_do_gbrs_inputs.nf.test
- tests/workflows/prepare_emase.nf.test
- tests/workflows/pta.nf.test
- tests/workflows/rna_fusion.nf.test
- tests/workflows/rnaseq.nf.test
- tests/workflows/rrbs.nf.test
- tests/workflows/somatic_wes.nf.test
- tests/workflows/somatic_wes_pta.nf.test
- tests/workflows/wes.nf.test
- tests/workflows/wgs.nf.test
- chipseq.nf: Error reporting added for malformed CSV input files
- pta.nf: Error reporting added for malformed CSV input files
- subworkflows/hs_pta.nf:
JAX_TRIMMER
replaced withFASTP
. GATK Baserecalibration is now scattered by chromosome. Options added to: 1. deduplicate reads withClumpify
prior to mapping steps, 2. Split FASTQ files into batched chunks for subsequent mapping. 3. Cap coverage at a user defined threshold using JVARKIT Biostar154220 prior to variant calling. Additionally, short_alignment_marking following mapping was previously disconnected for the workflow. This step has been included. - subworkflows/mm_pta.nf:
JAX_TRIMMER
replaced withFASTP
. Options added to: 1. deduplicate reads withClumpify
prior to mapping steps, 2. Split FASTQ files into batched chunks for subsequent mapping. 3. Cap coverage at a user defined threshold using JVARKIT Biostar154220 prior to variant calling. - rnaseq.nf:
Check Strandedness
log data added to MultiQC report. - wes.nf:
JAX_TRIMMER
replaced withFASTP
. For human analysis, GATK Baserecalibration is now scattered by chromosome. - wgs.nf:
JAX_TRIMMER
replaced withFASTP
. For human analysis, GATK Baserecalibration is now scattered by chromosome. Options added to: 1. deduplicate reads withClumpify
prior to mapping steps, 2. Split FASTQ files into batched chunks for subsequent mapping. 3. Cap coverage at a user defined threshold using JVARKIT Biostar154220 prior to variant calling.
- alntools/alntools_bam2emase.nf: Bump docker container version.
- bedtools/bedtools_genomecov.nf: Memory request increase.
- biqseq2/bicseq2_normalize.nf: Adjustment to read length parsing logic.
- bowtie/bowtie.nf: Memory request increase.
- bwa/bwa_mem.nf: Input tuple adjustment.
- bwa/bwa_mem_hla.nf: Input tuple adjustment.
- deeptools/deeptools_filter_remove_multi_sieve.nf
- emase/emase_create_hybrid.nf: Bump docker container version.
- emase/emase_get_common_alignment.nf: Bump docker container version. Memory request increase.
- emase/emase_prepare_emase.nf: Bump docker container version.
- emase/emase_run.nf: Bump docker container version.
- ensembl/varianteffectpredictor_germline_mouse.nf: Input tuple adjustment. Add BGZIP and indexing to final VCF output.
- fastp/fastp.nf: Memory request increase.
- gatk/gatk_applybqsr.nf: Added tmp dir to command.
- gatk/gatk_baserecalibrator.nf: Added tmp dir to command.
- gatk/gatk_chain_extract_badreads.nf: Added tmp dir to command.
- gatk/gatk_chain_filter_reads.nf: Added tmp dir to command.
- gatk/gatk_cnnscorevariants.nf: Added tmp dir to command.
- gatk/gatk_combinegvcfs.nf: Added tmp dir to command.
- gatk/gatk_depthofcoverage.nf: Added tmp dir to command.
- gatk/gatk_filtermutectcalls.nf: Added tmp dir to command.
- gatk/gatk_filtervarianttranches.nf: Added tmp dir to command.
- gatk/gatk_genotype_gvcf.nf: Added tmp dir to command.
- gatk/gatk_getsamplename.nf: Added tmp dir to command.
- gatk/gatk_getsamplename_noMeta.nf: Added tmp dir to command.
- gatk/gatk_haplotypecaller.nf: Added tmp dir to command.
- gatk/gatk_haplotypecaller_interval.nf: Added tmp dir to command.
- gatk/gatk_haplotypecaller_sv_germline.nf: Added tmp dir to command.
- gatk/gatk_indexfeaturefile.nf: Added tmp dir to command.
- gatk/gatk_mergemutectstats.nf: Added tmp dir to command.
- gatk/gatk_mergevcf.nf: Added tmp dir to command.
- gatk/gatk_mergevcf_list.nf: Added tmp dir to command.
- gatk/gatk_mutect2.nf: Added tmp dir to command.
- gatk/gatk_mutect2_tumorOnly.nf: Added tmp dir to command.
- gatk/gatk_selectvariants.nf: Added tmp dir to command.
- gatk/gatk_sortvcf_germline.nf: Added tmp dir to command.
- gatk/gatk_sortvcf_somatic_merge.nf: Added tmp dir to command.
- gatk/gatk_sortvcf_somatic_tools.nf: Added tmp dir to command.
- gatk/gatk_updatevcfsequencedictionary.nf: Added tmp dir to command.
- gatk/gatk_variantfiltration.nf: Added tmp dir to command.
- gatk/gatk_variantfiltration_af.nf: Added tmp dir to command.
- gatk/gatk_variantfiltration_mutect2.nf: Added tmp dir to command.
- gbrs/gbrs_bam2emase.nf: Bump docker container version. Memory request increase.
- gbrs/gbrs_compress.nf: Bump docker container version. Memory request increase.
- gbrs/gbrs_export.nf: Bump docker container version.
- gbrs/gbrs_interpolate.nf: Bump docker container version.
- gbrs/gbrs_plot.nf: Bump docker container version.
- gbrs/gbrs_quantify.nf: Bump docker container version.
- gbrs/gbrs_quantify_genotype.nf: Bump docker container version.
- gbrs/gbrs_reconstruct.nf: Bump docker container version.
- gridss/gridss_assemble.nf: Memory request increase.
- illumina/strelka2.nf: Wallclock request increase.
- multiqc/multiqc.nf: Tool version updated to v1.21
- nygc-short-alignment-marking/short_alignment_marking.nf: Bug correction in original module script.
- picard/picard_cleansam.nf: Output naming adjustment.
- picard/picard_collectalignmentsummarymetrics.nf: Added tmp dir to command.
- picard/picard_collecttargetpcrmetrics.nf: Added tmp dir to command.
- picard/picard_collectwgsmetrics.nf: Added tmp dir to command.
- picard/picard_fix_mate_information.nf: Corrected BAM sort order of output to coordinate.
- picard/picard_sortsam.nf: Added index creation option.
- samtools/samtools_calc_mtdna_filter_chrm.nf: Memory request increase.
- samtools/samtools_faidx.nf: Input tuple adjustment, and output reorganization.
- snpeff_snpsift/snpeff_snpeff.nf: Memory request increase. Tmp dir adjustment.
- snpeff_snpsift/snpsift_extractfields.nf: Added support for amplicon_generic, somatic_wes, and somatic_wes_pta workflows.
- squid/squid_call.nf: Memory request increase.
- utility_modules/chipseq_make_genome_filter.nf: Input tuple adjustment.
- utility_modules/jax_trimmer.nf: File output naming adjusted.
ancestry/vcf2eigenstrat.py: Convert VCF to EigenStrat format.
germline_sv/annot_vcf_with_depths.py: Add info fields for depths from individual caller to VCF files.
germline_sv/annot_vcf_with_exon.py: Apply 'InExon' INFO fields to original SV VCF files.
germline_sv/annot_vcf_with_on_target.py: Apply 'OnTarget' tINFO fields to original SV VCF files.
germline_sv/bedpetovcf.py: Convert BEDPE format back to SURVIVOR
like VCF.
germline_sv/clean_sniffles.sh: Adjust Sniffles
calls.
germline_sv/cnvnator2VCF.pl: Convert CNVnator
formatted files to VCF.
germline_sv/hydra_to_vcf.py: Convert Hydra BEDPE output into VCF 4.1 format.
germline_sv/merge_depths.R: Merge nanoSV and Sniffles read/support depths.
germline_sv/merge_sv.r: Merge an arbitrary number of VCFs, and annotate with simple event type.
germline_sv/parse_caller_depths.py: Parse SV caller VCFs to extract IDs and depth information.
germline_sv/parse_survivor_ids.py: Parse SURVIVOR merged VCFs to extract IDs.
germline_sv/sed_unquote.sh: script to remove double-quotes from files, which is used avoid issues with unescaped quotes in Nextflow script blocks.
germline_sv/summarize_intersections.R: Intersect SV calls by type with known structural variant databases.
germline_sv/surv_annot.sh: Adjust SURVIVOR
output to txt.
germline_sv/surv_annot_process.R: Adjust surv_annot output by SV type.
germline_sv/sv_to_table.py: Parse SURVIVOR merged VCF to output a summary table for each variant that lists the position, type, and size.
wes/AF_freebayes.py: Add Estimated Allele Frequency (ALT_AF) to the INFO field of FreeBayes VCF output.
wes/AF_haplotypecaller.py: Add Estimated Allele Frequency (ALT_AF) to the INFO field of HaplotypeCaller VCF output.
wes/TMB_calc.R: Compute tumor mutation burden. See Somatic WES wiki for details.
wes/allele_depth_min_and_AF_from_ADs.py: ecompute the locus depth from the allele-depths, and filter based on a minimum total allele depth.
wes/ensembl_annotation.pl: Annotates Ensembl transcripts and genes with copy number and breakpoints.
wes/scarHRD.R: Compute homologous recombination deficiency (HRD) with scarHRD.
wes/sequenza_run.R: Compute copy number variantion with Sequenza.
wes/sequenza_seg_na_window.R: Filter Sequenza CNV segments with NA
calls within 1Mb windows.
gbrs/gene_bp_to_cM_to_transprob.R: Added local BIOMART_CACHE location.
pta/make_main_vcf.py: Adjusted genomic build check logic blocks.
pta/make_txt.py: Adjusted genomic build check logic blocks.
pta/merge-caller-vcfs.r: Added logic to catch edge case where no variants were within a VCF for merging.
shared/extract_csv.nf: Added error reporting for malformed CSV input files.
shared/extract_gbrs_csv.nf: Added error reporting for malformed CSV input files.
In this release we have added the mouse version of PTA, and changed the read trimmer for the RNAseq pipeline to Fastp. Additionally, the latest version of Nextflow is now supported.
Note for Jackson Laboratory users on the Sumner cluster: Fastscratch has reached end of life, and is no longer supported. We have updated all example run scripts to point at /flashscratch
rather than /fastscratch
. For production analyses all working directories (i.e., -w <PATH>
) should use /flashscratch/$USER/...
.
- Mouse PTA
- bcftools/bcftools_bcf_to_vcf.nf
- bcftools/bcftools_compress_index.nf
- bcftools/bcftools_merge_delly_cnv.nf
- bcftools/bcftools_query_delly_cnv.nf
- delly/delly_call_somatic.nf
- delly/delly_classify.nf
- delly/delly_cnv_somatic.nf
- delly/delly_filter_somatic.nf
- ensembl/varianteffectpredictor_germline_mouse.nf
- ensembl/varianteffectpredictor_somatic_mouse.nf
- fastp/fastp.nf
- gatk/gatk_updatevcfsequencedictionary.nf
- python/python_somatic_vcf_finalization_mouse.nf
- r/annotate_delly_cnv.nf
- r/annotate_genes_sv_mouse.nf
- r/annotate_sv_mouse.nf
- r/annotate_sv_with_cnv_mouse.nf
- r/filter_bedpe_mouse.nf
- r/merge_sv_mouse.nf
- r/plot_delly_cnv.nf
- smoove/smoove_call.nf
- svtyper/svtyper.nf
- utility_modules/gzip.nf
- utility_modules/lumpy_compress_index.nf
- RNAseq: The read trimmer script was replaced with
fastp
. STAR logs from RSEM now saved and passed to MultiQC for summary. - Human PTA: The read trimmer script was replace with
fastp
.
- bwa/bwa_mem.nf: Wallclock and memory request adjustment.
- emase/emase_get_common_alignment.nf: Wallclock request adjustment.
- gatk/gatk_applybqsr.nf: Wallclock request adjustmnet.
- gatk/gatk_sortvcf_somatic_tools.nf: Added mouse PTA support.
- gridss/gridss_assemble.nf: Update container to correct bug in prior container build. Wallclock and memory adjustment.
- gridss/gridss_calling.nf: Update container to correct bug in prior container build.
- gridss/gridss_preprocess.nf: Update container to correct bug in prior container build.
- lumpy_sv/lumpy_sv.nf: Modified previously unused module for use in mouse PTA.
- msisensor2/msisensor2.nf: Correct
cp
error that can occur on nextflow resume. - msisensor2/msisensor2_tumorOnly.nf: Correct
cp
error that can occur on nextflow resume. - multiqc/multiqc.nf: Added cpu, memory, and wallclock requests.
- nygenome/lancet.nf: Memory request adjustment.
- nygenome/lancet_confirm.nf: Memory request adjustment.
- picard/picard_addorreplacereadgroups.nf: Memory request adjustment. Adjusted PICARD temp directory to Nextflow work directory.
- picard/picard_collectalignmentsummarymetrics.nf: Wallclock request adjustment.
- picard/picard_collecthsmetrics.nf: Wallclock request adjustment.
- picard/picard_reordersam.nf: Memory request adjustment. Adjust PICARD temp directory to Nextflow work directory.
- picard/picard_sortsam.nf: Wallclock request adjustment.
- python/python_lymphoma_classifier.nf: Typo correction in output name.
- python/python_somatic_vcf_finalization.nf: Added explicit genome support to facilitate adding mouse to PTA.
- python/python_split_mnv.nf: Memory request adjustment.
- r/annotate_sv.nf: Added explicit genome support to facilitate adding mouse to PTA.
- r/annotate_sv_with_cnv.nf: Minor output file name adjustment.
- rsem/rsem_alignment_expression.nf: Memory request adjustment. Remove dynamic memory request for STAR genome sort to correct memory failure errors. Added support to save STAR alignment logs.
- samtools/samtools_filter_unique_reads.nf: Adjust expected file name input.
- snpeff_snpsift/snpsift_annotate.nf: Adjusted output file name with respect to PTA.
- svaba/svaba.nf: Adjust Nextflow output streams to caputure index files.
- utility_modules/jax_trimmer.nf: Wallclock request adjustment.
- xenome/xenome.nf: Wallclock and memory request adjustment. Adjusted temp directory for
fastq-sort
to Nextflow work directory. - All modules:
${task.memory}
replaced the incorrect${task.mem}
in the Nextflow error catch statement.
- pta/annotate-bedpe-with-genes-mouse.r: Removed human specific database expectations.
- pta/annotate-cnv-delly.r: Adjusted CNV annotation for Delly output.
- pta/delly_cnv_plot.r: Added Delly CNV plot.
- pta/annotate-bedpe-with-databases.r: Added genome support. For BED annotations, the existing script checks for ANY overlap between BED intervals. For mouse data, this lead to errant overlaps in small InDEL and inversion regions; therefore, mouse PTA requires 80% overlap between target region and query BED.
- pta/filter-bedpe.r: For mouse PTA we know the type of SV event annotated from databases; therefore, we filter only calls that match annotation type (i.e., DEL, INS, INV). Adjustment to CNV breakpoint checks for cases when breakpoints are not present for targets being annotated. This can occur in mouse PTA due to the change to Delly CNV calling.
- pta/make_main_vcf.py: Added explicit genome support to facilitate adding mouse to PTA.
- pta/make_txt.py: Added explicit genome support to facilitate adding mouse to PTA.
- pta/merge-caller-vcfs.r: Added support for Delly. For Manta the 'infer missing breakpoint' was added as the caller does not insert the reciprocal call in the VCF as the other callers do.
In this minor release we have updated GBRS and EMASE containers to include a correction made on an index position bug in GBRS genotype printing. GBRS was failing to print the final gene genotype on each chromosome to the *.genotype.tsv
file.
None
None
None
- All EMASE and GBRS modules updated to the latest version of the EMASE/GBRS container.
In this minor release we have corrected a syntax error in the parsing of single end CSV input to EMASE and GBRS. The syntax error prevented the workflow from running single end data when CSV input files were used.
None
None
- EMASE: Correct csv single end parsing syntax.
- GBRS: Correct csv single end parsing syntax.
None
In this minor release we have patched PTA to correct for a potential script error relating annotating CNVs and SVs on chromosome Y.
None
None
- PTA: Adjusted when chromosome Y is included vs. excluded in caller merge and annotation steps.
None
In this minor release we have made minor adjustments to the amplicon workflow, and added strandedness log output.
None
None
- Amplicon: Alignment statistics are now taken post BQSR re-alignment.
- Primerclip: memory request increase.
- python/python_check_strandedness.nf: added log file output.
In this release we have added one additional pipeline: amplicon sequencing. This pipeline support the analysis of IDT xGen Amplicon panels, with current file support for xGen Human Sample ID Amplicon Panel. Additionally, we have added a classifier for EBV-associated PDX lymphomas to the PDX RNA pipeline.
- Amplicon
- python/python_generate_fingerprint_report.nf
- python/python_lymphoma_classifier.nf
- PDX RNAseq: added a classifier for EBV-associated PDX lymphomas.
- Cutadapt module function renamed from 'FILTER_FASTQ' to 'CUTADAPT'. Module file name adjusted to cutdadapt/cutadapt.nf
- python/python_check_strandedness.nf: Added strandedness override parameter for cases when
check_strandedness
fails to determine strand directionality. Corrected logic bug associated with parsing output from the tool. - rsem/rsem_alignment_expression.nf: Resource request adjustment.
In this release we have added five additional pipelines as part of the genetic diversity analysis suite. These pipelines support the analysis of genetically diverse samples (e.g., DO and CC mice) with EMASE and GBRS, and the generation of reference files required for running these tools.
- EMASE
- GBRS
- Generate Pseudoreference
- Prepare EMASE Reference/Inputs
- Prepare DO GBRS Inputs
- alntools/alntools_bam2emase.nf
- bowtie/bowtie.nf
- bowtie/bowtie_build.nf
- emase/emase_create_hybrid.nf
- emase/emase_get_common_alignment.nf
- emase/emase_prepare_emase.nf
- emase/emase_run.nf
- g2gtools/g2gtools_convert.nf
- g2gtools/g2gtools_extract.nf
- g2gtools/g2gtools_gtf2db.nf
- g2gtools/g2gtools_patch.nf
- g2gtools/g2gtools_transform.nf
- g2gtools/g2gtools_vcf2vci.nf
- gbrs/gbrs_bam2emase.nf
- gbrs/gbrs_compress.nf
- gbrs/gbrs_export.nf
- gbrs/gbrs_interpolate.nf
- gbrs/gbrs_plot.nf
- gbrs/gbrs_quantify.nf
- gbrs/gbrs_quantify_genotype.nf
- gbrs/gbrs_reconstruct.nf
- python/append_dropped_chroms.nf
- python/clean_prepEmase_transcriptList.nf
- python/parse_gene_positions.nf
- python/parse_transprobs.nf
- r/do_transition_probablities.nf
- r/generate_grid_file.nf
- samtools/samtools_faidx_g2gtool.nf
- utility_modules/filter_gtf_biotypes.nf
- utility_modules/snorlax.nf
None
None
In this minor release we have modified the behavior of Xenome to output compressed FASTQ files, and to delete the intermediate FASTQ files that are generated. We are implementing this change because the previous behavior of Xenome resulted in a large amount of redundant data in work directories.
We also added PDX test data for RNA-fusion.
None
None
- Changes to PDX RNA-seq, PDX WES, PDX RNA Fusion, and PDX PTA to reflect modifications to Xenome
- xenome/xenome.nf modified to combine
xenome classify
andfastq-sort
into the XENOME_CLASSIFY module. For non-fusion applications, human and mouse reads are now emitted as compressed .fastq.gz files - Removed fastq-tools/fastq-sort.nf as its functionality is now in xenome/xenome.nf
- Modified input type specification for kallisto/kallisto_insert_size.nf to address issue with flash storage mounting in Singularity.
- Added text file to pubDir statement in Picard collectRNAseqMetrics
In this major release we have added two additional pipelines, added flexibility for specifying inputs via sample sheets, support for downloading remote input data, support for GRCm39, support for PDX data, and many more changes detailed below. Additionally, we have added the concept of "subworkflows" for tasks that are more complex than a module and/or involve multiple containers, yet can be potentially re-used in multiple pipelines.
- ChIP-seq - human, mouse
- Paired Tumor Analysis (somatic/germline WGS) - human, PDX
- Aria download for remote input data
- Concatenate paired tumor/normal FASTQ files
- RNA-seq for PDX input data
- arriba/arriba.nf
- bamtools/bamtools_filter.nf
- bcftools/bcftools_germline_filter.nf
- bcftools/bcftools_intersect_lancet_candidates.nf
- bcftools/bcftools_merge_callers.nf
- bcftools/bcftools_remove_spanning.nf
- bcftools/bcftools_split_multiallelic_regions.nf
- bcftools/bcftools_split_multiallelic.nf
- bedtools/bedtools_amplicon_metrics.nf
- bedtools/bedtools_genomecov.nf
- bedtools/bedtools_start_candidates.nf
- biqseq2/bicseq2_normalize.nf
- biqseq2/bicseq2_seg_unpaired.nf
- biqseq2/bicseq2_seg.nf
- conpair/conpair_pileup.nf
- conpair/conpair.nf
- cosmic/cosmic_add_cancer_resistance_mutations_germline.nf
- cosmic/cosmic_add_cancer_resistance_mutations_somatic.nf
- cosmic/cosmic_annotation_somatic.nf
- cosmic/cosmic_annotation.nf
- deeptools/deeptools_computematrix.nf
- deeptools/deeptools_plotfingerprint.nf
- deeptools/deeptools_plotheatmap.nf
- deeptools/deeptools_plotprofile.nf
- ensembl/varianteffectpredictor_germline.nf
- ensembl/varianteffectpredictor_somatic.nf
- fastq-tools/fastq-pair.nf
- fastq-tools/fastq-sort.nf
- fusion_report/fusion_report.nf
- fusioncatcher/fusioncatcher.nf
- gatk/gatk_cnnscorevariants.nf
- gatk/gatk_combinegvcfs.nf
- gatk/gatk_filtermutectcalls_tumorOnly.nf
- gatk/gatk_filtermutectcalls.nf
- gatk/gatk_filtervarianttranches.nf
- gatk/gatk_genotype_gvcf.nf
- gatk/gatk_getsamplename_noMeta.nf
- gatk/gatk_getsamplename.nf
- gatk/gatk_haplotypecaller_sv_germline.nf
- gatk/gatk_mergemutectstats.nf
- gatk/gatk_mutect2_tumorOnly.nf
- gatk/gatk_mutect2.nf
- gatk/gatk_sortvcf_germline.nf
- gatk/gatk_sortvcf_somatic_merge.nf
- gatk/gatk_sortvcf_somatic_tools.nf
- gatk/gatk_variantfiltration_af.nf
- gatk/gatk_variantfiltration_mutect2.nf
- gatk/gatk3_applyrecalibration.nf
- gatk/gatk3_genotypegvcf.nf
- gatk/gatk3_haplotypecaller.nf
- gatk/gatk3_indelrealigner.nf
- gatk/gatk3_realignertargetcreator.nf
- gatk/gatk3_variantannotator.nf
- gatk/gatk3_variantrecalibrator.nf
- gridss/gridss_assemble.nf
- gridss/gridss_calling.nf
- gridss/gridss_chrom_filter.nf
- gridss/gridss_preprocess.nf
- gridss/gripss_somatic_filter.nf
- homer/annotate_boolean_peaks.nf
- homer/homer_annotatepeaks.nf
- homer/plot_homer_annotatepeaks.nf
- illumina/manta.nf
- illumina/strelka2.nf
- jaffa/jaffa.nf
- kallisto/kallisto_insert_size.nf
- kallisto/kallisto_quant.nf
- lumpy_sv/lumpy_sv.nf
- macs2/macs2_consensus.nf
- macs2/macs2_peak_calling_chipseq.nf
- macs2/plot_macs2_qc.nf
- msisensor2/msisensor2_tumorOnly.nf
- msisensor2/msisensor2.nf
- multiqc/multiqc_custom_phantompeakqualtools.nf
- novocraft/novosort.nf
- nygc-short-alignment-marking/short_alignment_marking.nf
- nygenome/lancet_confirm.nf
- nygenome/lancet.nf
- phantompeakqualtools/phantompeakqualtools.nf
- picard/picard_cleansam.nf
- picard/picard_collectmultiplemetrics.nf
- picard/picard_collecttargetpcrmetrics.nf
- picard/picard_fix_mate_information.nf
- picard/picard_mergesamfiles.nf
- pizzly/pizzly.nf
- preseq/preseq.nf
- primerclip/primerclip.nf
- python/python_add_final_allele_counts.nf
- python/python_add_nygc_allele_counts.nf
- python/python_check_strandedness.nf
- python/python_filter_pon.nf
- python/python_filter_vcf.nf
- python/python_germline_vcf_finalization.nf
- python/python_get_candidates.nf
- python/python_merge_columns.nf
- python/python_merge_prep.nf
- python/python_remove_contig.nf
- python/python_rename_metadata.nf
- python/python_rename_vcf.nf
- python/python_reorder_vcf_columns.nf
- python/python_snv_to_mnv_final_filter.nf
- python/python_somatic_vcf_finalization.nf
- python/python_split_mnv.nf
- python/python_vcf_to_bed.nf
- r/annotate_bicseq2_cnv.nf
- r/annotate_genes_sv.nf
- r/annotate_sv_with_cnv.nf
- r/annotate_sv.nf
- r/filter_bedpe.nf
- r/frag_len_plot.nf
- r/merge_sv.nf
- samtools/samtools_faidx.nf
- samtools/samtools_filter_unique_reads.nf
- samtools/samtools_filter.nf
- samtools/samtools_mergebam_filter.nf
- samtools/samtools_stats_insertsize.nf
- samtools/samtools_stats.nf
- samtools/samtools_view.nf
- squid/squid_annotate.nf
- squid/squid_call.nf
- star/star_align.nf
- star-fusion/star-fusion.nf
- subread/subread_feature_counts_chipseq.nf
- svaba/svaba.nf
- tabix/compress_merged_vcf.nf
- tabix/compress_vcf_region.nf
- tabix/compress_vcf.nf
- ucsc/ucsc_bedgraphtobigwig.nf
- utility_modules/aria_download.nf
- utility_modules/chipseq_bampe_rm_orphan.nf
- utility_modules/chipseq_check_design.nf
- utility_modules/chipseq_make_genome_filter.nf
- utility_modules/concatenate_reads_sampleSheet.nf
- utility_modules/deseq2_qc.nf
- utility_modules/frip_score.nf
- utility_modules/get_read_length.nf
- utility_modules/gunzip.nf
- utility_modules/jax_trimmer.nf
- utility_modules/parse_extracted_sv_table.nf
- xenome/xenome.nf
- WES, RNA-seq, and RNA-fusion added support for PDX data
- WES, RNA-seq, WGS, ATAC, RRBS, ChIP added support for GRCm39
- Support for input specification using sample sheets for ATAC, RNA-seq, RRBS, WES, WGS
- Support for downloading input data for ATAC, RNA-seq, RRBS, WES, WGS
- Added MULTIQC to ATAC, RNA-seq, RRBS, WES, WGS
- Added assessment of strandedness using python/python_check_strandedness.nf rather than requiring specification via parameters
- Added assessment of read length for RNAseq for STAR index selection rather than requiring specfication via parameters
- Modified variant annotations in WES and WGS
- Added GVCF support for WES and WGS
- errorStrategy modified for all modules to catch and report instances where tasks fail due to walltime or memory contraints. This previously required a deep reading of the subtask SLURM logs, but now will be reported in the top-level SLURM log and is more user-friendly
- Removed log.info statements from modules to avoid noisy disruption of log files
- ChIP-seq support for bwa/bwa_mem.nf, fastqc/fastqc.nf, picard/picard_markduplicates.nf, trim_galore/trim_galore.nf
- Corrected emit statements for g2gtools/g2gtools_chain_convert_peak.nf
- Corrected emit statements for gatk/gatk_chain_filter_reads.nf
- Modified gatk/gatk_haplotypecaller_interval.nf and gatk/gatk_haplotypecaller.nf for optional GVCF support
- Generalized multiqc/multiqc.nf via parameter for multiqc config
- Removed --METRIC_ACCUMULATION_LEVEL ALL_READS and --VALIDATION_STRINGENCY LENIENT parameters from picard/picard_collectalignmentsummarymetrics.nf
- Modified strand specification logic for picard/picard_collectrnaseqmetrics.nf
- Updated rsem/rsem_alignment_expression.nf to reflect changes in strandedness detection, reorganized outputs and catching log files for multiqc
- Changes to output text for mt DNA content in samtools/samtools_calc_mtdna_filter_chrm.nf
- Changes to output text from samtools/samtools_final_calc_frip.nf
- Changes to output formatting for samtools/samtools_quality_checks.nf
- Updated snpEff container to v5.1d to support GRCm39
- Changes to output fields for mouse and human from snpeff_snpsift/snpsift_extractfields.nf
- Added missing container to utility_modules/concatenate_reads_PE.nf and utility_modules/concatenate_reads_SE.nf
- Change WES and WGS COMSIC annotation to use SNPsift.
- Added explicit dbSNP annotation.
NONE
- SNPSIFT_ANNOTATE
- WES and WGS now use SNPSift to annotate COSMIC and dbSNP IDs onto variants.
- COSMIC_ANNOTATION and associated perl scripts removed.
Added STAR support to RNA-seq pipeline.
NONE
NONE
- RNA-seq pipeline now supports STAR and bowtie2 (default) through the RSEM module.
- RSEM: --rsem_aligner accepts "bowtie2" or "star." The default STAR indices for mouse and human are 100 bp, with alternates suggested in the RNA-seq config file.
NOTE: This release contains a patch for multi-sample processing. We strongly recommend multi-sample processing done prior to this release should be re-run with v0.2.0+
- RRBS - Mouse & Human
- ATAC - Mouse & Human
- FastQC
- Trim-Galore
- Bismark Alignment
- Bismark Deduplicator
- Bismark Methylation Extractor
- MultiQC
- Bedtools functions for ATAC QC summary
- Bowtie2
- Cutadapt
- Deeptools bamcoverage and alignmentSieve
- g2gTools chain convert
- Macs2 ATAC peak calling and ATAC peak coverage
- Subread feature counts
- Multiple pipeline changes related to multi-sample patch.
- Modified module load statements to invoke "${projectDir}" instead of relative "../" path.
- Removed CTP and Probe coverage calculations from human RNA-seq
- Multiple module changes related to multi-sample patch.
- Trimmomatic Trim stub module removed.
- RSEM - forward stranded option added.
- Picard Collect RNAseqMetrics - forward strand option added.
Updated run scripts to load CS supported Nextflow module.
NONE
- concatenate_reads_PE.nf
- concatenate_reads_SE.nf
- Modules refactored to individual files (e.g., gatk_haplotypecaller.nf).
- Added ability to concatenate Fastq files by sample, which are split across sequencing lanes into single R1/R2 or R1 files (depending on PE or SE).
- Adjusted pipelines for refactored module files.
- Fixed CTP/PROBE typo in human RNA coverage calculation.
- Added HPC
--profile
options and settings for Sumner and Elion.
- Adjusted WGS wall clock settings.
- Refactored modules to individual files (e.g., gatk_haplotypecaller.nf).
- Set pipeline script parameter to hard coded paths.
- Cleaned all Nextflow files from the bin directory.
- Removed Sumner specific HPC settings from each module.
- Whole Genome Sequencing - Mouse & Human
- Whole Exome Sequencing - Mouse & Human
- RNA Sequencing - Mouse & Human
- bamtools.nf
- bcftools.nf
- bwa.nf
- cosmic.nf
- gatk.nf
- picard.nf
- quality_stats.nf
- read_groups.nf
- rsem.nf
- samtools.nf
- snpeff.nf
- snpsift.nf
- summary_stats.nf
- trimmomatic.nf
NONE
NONE