Skip to content

RustQC testing#1

Open
ewels wants to merge 183 commits intodevfrom
rustqc
Open

RustQC testing#1
ewels wants to merge 183 commits intodevfrom
rustqc

Conversation

@ewels
Copy link
Member

@ewels ewels commented Feb 13, 2026

No description provided.

@github-actions
Copy link

github-actions bot commented Feb 13, 2026

nf-core pipelines lint overall result: Passed ✅ ⚠️

Posted for pipeline commit fdcc577

+| ✅ 205 tests passed       |+
#| ❔   9 tests were ignored |#
#| ❔   1 tests had warnings |#
!| ❗   8 tests had warnings |!
Details

❗ Test warnings:

  • files_exist - File not found: assets/multiqc_config.yml
  • pipeline_todos - TODO string in nextflow.config: Specify any additional parameters here
  • pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline
  • pipeline_todos - TODO string in awsfulltest.yml: You can customise AWS full pipeline tests as required
  • pipeline_todos - TODO string in base.config: Check the defaults for all processes
  • pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
  • pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
  • pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!

❔ Tests ignored:

❔ Tests fixed:

✅ Tests passed:

Run details

  • nf-core/tools version 3.5.1
  • Run at 2026-03-09 22:37:50

pinin4fjords and others added 25 commits February 13, 2026 14:16
Add tximport, tx2gene, and SummarizedExperiment steps to the RSEM
quantification pathway, achieving output parity with Salmon/Kallisto.
RSEM users now get length matrices, length-scaled counts, scaled counts,
and SummarizedExperiment RDS objects, enabling direct use with
nf-core/differentialabundance and Bioconductor workflows.

DESeq2 QC for RSEM now uses length-scaled counts (matching Salmon).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace modules/local/rsem_merge_counts with modules/nf-core/custom/rsemmergecounts
and update custom/tx2gene and tximeta/tximport modules from nf-core/modules#9995.
Adds RSEM support to tx2gene.py and tximport.r templates.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…dexperiment subworkflow

Factor out CUSTOM_TX2GENE, TXIMETA_TXIMPORT, and SummarizedExperiment
steps into a shared nf-core subworkflow, avoiding duplication across
QUANTIFY_PSEUDO_ALIGNMENT and QUANTIFY_RSEM pathways.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove subworkflows/local/quantify_rsem and add subworkflows/nf-core/quantify_rsem
from nf-core/modules#9995, including comprehensive tests with pre-computed
result archives via UNTAR.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Reinstall quantify_pseudo_alignment from nf-core/modules to use the
shared quant_tximport_summarizedexperiment subworkflow. Move SALMON_QUANT
and KALLISTO_QUANT publishDir rules to the pipeline config since the
nf-core subworkflow config only provides ext.args defaults.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update snapshots for star_rsem, bam_input, sentieon_default, and
parabricks_default tests to reflect new tximport outputs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use exact filenames (rsem.merged.gene.SummarizedExperiment.rds) instead
of glob patterns that didn't match the actual output. Add detailed assay
name descriptions matching the Salmon/Kallisto documentation format.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When `--save_merged_fastq` is set, single-library samples now also
pass through CAT_FASTQ so they are published alongside genuinely
merged samples. Default behaviour (flag unset) is unchanged.

Fixes nf-core#748

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
R binary serialization (.rds) produces architecture-dependent output,
causing snapshot mismatches on ARM runners. Add **/*.rds to .nftignore
and strip md5 hashes from .rds entries in all snap files.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ngle-samples

fix: Include single-library samples in merged fastq output
The .nftignore **/*.rds pattern filters .rds files from the collected
output, so these entries must also be removed from the snapshot
assertions to avoid mismatches.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
feat: Add tximport processing for RSEM outputs (nf-core#1320)
…#829)

The pipeline architecture changed since nf-core#829 was filed: STAR now runs
independently via ALIGN_STAR, producing genome and transcriptome BAMs
that pass through UMI dedup before RSEM receives them in --alignments
mode. The validation error blocking this combination was outdated.

- Remove rsemUmiError() validation check and function definition
- Remove corresponding function test and snapshot
- Add pipeline test for --aligner star_rsem --with_umi
- Rename prepare_for_salmon_log to prepare_for_quantification_log
  since the step runs for both star_salmon and star_rsem

[skip ci]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
[skip ci]

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
feat: Enable UMI deduplication with STAR/RSEM aligner
When using --skip_alignment with a pre-built --salmon_index or
--kallisto_index, the pipeline no longer requires --fasta. Also
fix GTF_FILTER to only run when a genome FASTA is available,
preventing downstream GTF channel from becoming empty.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
GTF_FILTER serves a dual purpose: filtering by genome sequences AND
removing entries with invalid transcript_id. The latter is needed
even without fasta (e.g. genes_with_empty_tid.gtf), and ch_fasta is
initialized as channel.of([]) so GTF_FILTER runs correctly without
a genome. Reverting the fasta_provided guard fixes the CI failure.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
pinin4fjords and others added 26 commits February 27, 2026 13:23
…se-profile

test: load prokaryotic config in nf-test instead of duplicating params
The seqera conda channel only has ARM builds of STAR 2.6.1d. The
default (non-ARM) configs were incorrectly using seqera::star=2.6.1d,
which fails conda environment creation on x86_64 CI runners. Switch
to bioconda::star=2.6.1d for the default configs, keeping the seqera
channel override in conf/arm.config for ARM builds.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fix iGenomes STAR conda channel for x86_64
Previously, --outSAMattrRGline (STAR) and --rg-id/--rg SM (HISAT2)
were only emitted when seq_platform or seq_center was set. When neither
was provided, samples fell back to aligner defaults (e.g. STAR's
GRP1) instead of using the sample-specific meta.id. Bowtie2 was
already correct.

Remove the conditional guard so ID and SM are always set to meta.id,
with PL and CN still added only when the corresponding values are
provided.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Update version strings, MultiQC config URLs to dev branch,
and ro-crate-metadata.json timestamps.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
fix: always set ID/SM read group tags; bump version to 3.24.0dev
The previous fix (PR nf-core#1696) only skipped igenomes_base validation when
NXF_OFFLINE=true, but users without AWS credentials also get validation
failures (403 Access Denied). Since igenomes_base is never used directly
and derived paths are validated individually, always skip it per the
nf-schema maintainer's recommendation.

Also removes format: directory-path from the schema for igenomes_base,
as recommended in nextflow-io/nf-schema#204.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tion

fix: always skip igenomes_base validation
Integrate all 7 RSeQC tools (bam_stat, infer_experiment, read_duplication,
read_distribution, junction_annotation, junction_saturation, inner_distance)
into the single rustqc rna command. When RustQC is enabled, BAM_RSEQC only
runs for tin (which RustQC does not implement). Strandedness comparison uses
infer_experiment output from whichever tool produces it.
Upstream RustQC now derives transcript structure directly from GTF,
making --gtf and --bed mutually exclusive. With --gtf, all analyses
(dupRadar, featureCounts, and RSeQC tools) run without a BED file.

BED file flow is retained for BAM_RSEQC (tin module).
- Add --flat-output flag since RustQC now writes to subfolders by default
- Add SVG output declarations for all plot outputs (PNG + SVG pairs)
- Rename plot emit names with _png/_svg suffixes for clarity
- Update publishDir patterns to include SVG files
- Update stub section with SVG files
- Add TIN, preseq, samtools flagstat/idxstats/stats, and Qualimap gene
  body coverage outputs to the RustQC module and publishDir config
- Rename --skip_rustqc (default true) to --use_rustqc (default false)
  as a single toggle that automatically disables all replaced tools
- When --use_rustqc is enabled, dupRadar, featureCounts biotype QC,
  RSeQC, Preseq, Qualimap, and samtools stats are all skipped
- Wire all new RustQC outputs to MultiQC
… enabled

Use ext.when config to skip BAM_STATS_SAMTOOLS processes inside
BAM_MARKDUPLICATES_PICARD and BAM_SORT_STATS_SAMTOOLS subworkflows
when RustQC provides equivalent outputs.
ewels added 3 commits March 9, 2026 23:29
… MultiQC path_filters

- Rewrite RUSTQC process to emit entire output directory instead of
  individual file globs, matching the RustQC-benchmarks pipeline design
- Add BAI index input, update container to ghcr.io/seqeralabs/rustqc:dev
- Add --biotype-attribute gene_type support for GENCODE annotations
- Simplify publishDir config to publish directory contents directly
- Simplify workflow wiring: pass whole results dir to MultiQC
- Add MultiQC path_filters to label RustQC vs upstream tool sections
- Add dupradar and subread_featurecounts to MultiQC run_modules
- Fix TIN summary filename (*.summary.txt) and version command
New param runs RustQC in addition to the standard QC tools rather than
replacing them, enabling side-by-side comparison in the MultiQC report.
Also updates use_rustqc schema description with full list of tools.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants