Conversation
|
Add tximport, tx2gene, and SummarizedExperiment steps to the RSEM quantification pathway, achieving output parity with Salmon/Kallisto. RSEM users now get length matrices, length-scaled counts, scaled counts, and SummarizedExperiment RDS objects, enabling direct use with nf-core/differentialabundance and Bioconductor workflows. DESeq2 QC for RSEM now uses length-scaled counts (matching Salmon). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace modules/local/rsem_merge_counts with modules/nf-core/custom/rsemmergecounts and update custom/tx2gene and tximeta/tximport modules from nf-core/modules#9995. Adds RSEM support to tx2gene.py and tximport.r templates. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…dexperiment subworkflow Factor out CUSTOM_TX2GENE, TXIMETA_TXIMPORT, and SummarizedExperiment steps into a shared nf-core subworkflow, avoiding duplication across QUANTIFY_PSEUDO_ALIGNMENT and QUANTIFY_RSEM pathways. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove subworkflows/local/quantify_rsem and add subworkflows/nf-core/quantify_rsem from nf-core/modules#9995, including comprehensive tests with pre-computed result archives via UNTAR. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Reinstall quantify_pseudo_alignment from nf-core/modules to use the shared quant_tximport_summarizedexperiment subworkflow. Move SALMON_QUANT and KALLISTO_QUANT publishDir rules to the pipeline config since the nf-core subworkflow config only provides ext.args defaults. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update snapshots for star_rsem, bam_input, sentieon_default, and parabricks_default tests to reflect new tximport outputs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use exact filenames (rsem.merged.gene.SummarizedExperiment.rds) instead of glob patterns that didn't match the actual output. Add detailed assay name descriptions matching the Salmon/Kallisto documentation format. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When `--save_merged_fastq` is set, single-library samples now also pass through CAT_FASTQ so they are published alongside genuinely merged samples. Default behaviour (flag unset) is unchanged. Fixes nf-core#748 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
R binary serialization (.rds) produces architecture-dependent output, causing snapshot mismatches on ARM runners. Add **/*.rds to .nftignore and strip md5 hashes from .rds entries in all snap files. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ngle-samples fix: Include single-library samples in merged fastq output
The .nftignore **/*.rds pattern filters .rds files from the collected output, so these entries must also be removed from the snapshot assertions to avoid mismatches. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
feat: Add tximport processing for RSEM outputs (nf-core#1320)
…#829) The pipeline architecture changed since nf-core#829 was filed: STAR now runs independently via ALIGN_STAR, producing genome and transcriptome BAMs that pass through UMI dedup before RSEM receives them in --alignments mode. The validation error blocking this combination was outdated. - Remove rsemUmiError() validation check and function definition - Remove corresponding function test and snapshot - Add pipeline test for --aligner star_rsem --with_umi - Rename prepare_for_salmon_log to prepare_for_quantification_log since the step runs for both star_salmon and star_rsem [skip ci] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
[skip ci] Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
feat: Enable UMI deduplication with STAR/RSEM aligner
When using --skip_alignment with a pre-built --salmon_index or --kallisto_index, the pipeline no longer requires --fasta. Also fix GTF_FILTER to only run when a genome FASTA is available, preventing downstream GTF channel from becoming empty. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
GTF_FILTER serves a dual purpose: filtering by genome sequences AND removing entries with invalid transcript_id. The latter is needed even without fasta (e.g. genes_with_empty_tid.gtf), and ch_fasta is initialized as channel.of([]) so GTF_FILTER runs correctly without a genome. Reverting the fasta_provided guard fixes the CI failure. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…se-profile test: load prokaryotic config in nf-test instead of duplicating params
The seqera conda channel only has ARM builds of STAR 2.6.1d. The default (non-ARM) configs were incorrectly using seqera::star=2.6.1d, which fails conda environment creation on x86_64 CI runners. Switch to bioconda::star=2.6.1d for the default configs, keeping the seqera channel override in conf/arm.config for ARM builds. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fix iGenomes STAR conda channel for x86_64
Previously, --outSAMattrRGline (STAR) and --rg-id/--rg SM (HISAT2) were only emitted when seq_platform or seq_center was set. When neither was provided, samples fell back to aligner defaults (e.g. STAR's GRP1) instead of using the sample-specific meta.id. Bowtie2 was already correct. Remove the conditional guard so ID and SM are always set to meta.id, with PL and CN still added only when the corresponding values are provided. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update version strings, MultiQC config URLs to dev branch, and ro-crate-metadata.json timestamps. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
fix: always set ID/SM read group tags; bump version to 3.24.0dev
The previous fix (PR nf-core#1696) only skipped igenomes_base validation when NXF_OFFLINE=true, but users without AWS credentials also get validation failures (403 Access Denied). Since igenomes_base is never used directly and derived paths are validated individually, always skip it per the nf-schema maintainer's recommendation. Also removes format: directory-path from the schema for igenomes_base, as recommended in nextflow-io/nf-schema#204. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tion fix: always skip igenomes_base validation
Integrate all 7 RSeQC tools (bam_stat, infer_experiment, read_duplication, read_distribution, junction_annotation, junction_saturation, inner_distance) into the single rustqc rna command. When RustQC is enabled, BAM_RSEQC only runs for tin (which RustQC does not implement). Strandedness comparison uses infer_experiment output from whichever tool produces it.
Upstream RustQC now derives transcript structure directly from GTF, making --gtf and --bed mutually exclusive. With --gtf, all analyses (dupRadar, featureCounts, and RSeQC tools) run without a BED file. BED file flow is retained for BAM_RSEQC (tin module).
- Add --flat-output flag since RustQC now writes to subfolders by default - Add SVG output declarations for all plot outputs (PNG + SVG pairs) - Rename plot emit names with _png/_svg suffixes for clarity - Update publishDir patterns to include SVG files - Update stub section with SVG files
- Add TIN, preseq, samtools flagstat/idxstats/stats, and Qualimap gene body coverage outputs to the RustQC module and publishDir config - Rename --skip_rustqc (default true) to --use_rustqc (default false) as a single toggle that automatically disables all replaced tools - When --use_rustqc is enabled, dupRadar, featureCounts biotype QC, RSeQC, Preseq, Qualimap, and samtools stats are all skipped - Wire all new RustQC outputs to MultiQC
… enabled Use ext.when config to skip BAM_STATS_SAMTOOLS processes inside BAM_MARKDUPLICATES_PICARD and BAM_SORT_STATS_SAMTOOLS subworkflows when RustQC provides equivalent outputs.
… MultiQC path_filters - Rewrite RUSTQC process to emit entire output directory instead of individual file globs, matching the RustQC-benchmarks pipeline design - Add BAI index input, update container to ghcr.io/seqeralabs/rustqc:dev - Add --biotype-attribute gene_type support for GENCODE annotations - Simplify publishDir config to publish directory contents directly - Simplify workflow wiring: pass whole results dir to MultiQC - Add MultiQC path_filters to label RustQC vs upstream tool sections - Add dupradar and subread_featurecounts to MultiQC run_modules - Fix TIN summary filename (*.summary.txt) and version command
New param runs RustQC in addition to the standard QC tools rather than replacing them, enabling side-by-side comparison in the MultiQC report. Also updates use_rustqc schema description with full list of tools.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.