v2.0.0: Refactor for sample-wise parameterisation #171

nschan · 2025-07-09T13:27:51Z

As suggested here this is full refactor of genomeassembler to support sample-level parameterisation of everything.
Currently, this PR contains the full pipeline, tested with stub runs of heterogenous samples in a single sample sheet.

Why?
Often when doing genome assembly, we do not know what works best. With this change, this pipeline can be used to compare different settings for the same set of reads, to compare the assembly outcome. Samples that share the same value in group will be combined during reporting to facilitate comparisons of strategies on the same input(s). The report process / script will be updated to fit this new design (ongoing).

Details
This was a bit more tricky than I had initially hoped. Essentially, all params are stuffed into a main channel, which contains a map. I think a map is the only way to handle this channel safely, since sometimes entries are replaced and I am afraid that positional indexing would be too confusing (for me).

This works fine, but channels containing maps cannot be joined. For this reason, a pattern that looks like:

map_channel_1
            // Convert to list for join
            .map { it -> it.collect { entry -> [ entry.value, entry ] } }
            .join( map_channel_2
                     // Convert to list for join
                    .map { it -> it.collect { entry -> [ entry.value, entry ] } }
            )
            // After joining re-create the maps from the stored map
            .map { it -> it.collect { _entry, map -> [ (map.key): map.value ] }.collectEntries() }

is used throughout to join map channels and recover the map after joining.
Generally, to facilitate asynchronous movement mix() is used and join() is (hopefully) used considerately to avoid blocking resulting from waiting for processes.

The overall sample-wise parameterisation is offloaded to subworkflows/local/utils_nfcore_genomeassembler_pipeline/main.nf; currently this does not produce errors (even though it should). This should be a minor fix.
This also does some validation, and consolidates conflicts that may arise from params that are incompatible with certain samples, e.g. medaka cannot be used if there are no ONT reads.
Currently, there are no tests included since I would like to get some feedback on whether this is at all reasonable, or if there would be better ways to do things.
I have tested this with a samplesheet that looks like:

sample,ontreads,hifireads,ref_fasta,ref_gff,shortread_F,shortread_R,paired,strategy,assembler
Sample_flye,ONT.fastq.gz,,ref.fasta,ref.gff3,shortread_F.fastq.gz,shortread_R.fastq.gz,true,single,flye
Sample_hifiasm,,hifi_reads.fastq.gz,ref.fasta,ref.gff3,shortread_F.fastq.gz,shortread_R.fastq.gz,true,single,hifiasm
Sample_hifiasm_ont,ONT.fastq.gz,,ref.fasta,ref.gff3,shortread_F.fastq.gz,shortread_R.fastq.gz,true,single,hifiasm
Sample_hifiasm_ul,ONT.fastq.gz,hifi_reads.fastq.gz,ref.fasta,ref.gff3,shortread_F.fastq.gz,shortread_R.fastq.gz,true,hybrid,hifiasm
Sample_flye_hifiasm_scaff,ONT.fastq.gz,hifi_reads.fastq.gz,ref.fasta,ref.gff3,shortread_F.fastq.gz,shortread_R.fastq.gz,true,scaffold,flye_hifiasm
Sample_hifiasm_scaff,ONT.fastq.gz,hifi_reads.fastq.gz,ref.fasta,ref.gff3,shortread_F.fastq.gz,shortread_R.fastq.gz,true,scaffold,hifiasm_hifiasm

In combination with different params (e.g. --polish_pilon, --scaffold_longstitch, etc) in stub runs, so I think the overall logic is fine.

…ization

* add prefix to singularity container for report * add files exist check for references, closes nf-core#165

* Template update for nf-core/tools version 3.2.1 * Template update for nf-core/tools version 3.3.1 * merge template 3.3.1 - fix linting * update pre-commit * merge template 3.3.1 - fix linting * pre-commit config? * pre-commit config? * reinstall links * try larger runner * smaller run, disable bloom filter for hifiasm test * updated test snapshot * updated test snapshot * update nftignore * update nftignore * update nftignore * update nftignore * update nftignore * update nftignore * update nftignore * update nftignore * update nftignore * Update .github/actions/nf-test/action.yml Co-authored-by: Matthias Hörtenhuber <mashehu@users.noreply.github.com> * Update docs/output.md Co-authored-by: Matthias Hörtenhuber <mashehu@users.noreply.github.com> * remove .nf-test.log --------- Co-authored-by: Niklas Schandry <niklas@bio.lmu.de> Co-authored-by: Matthias Hörtenhuber <mashehu@users.noreply.github.com>

…ization

nf-core-bot · 2025-07-09T13:28:25Z

Warning

Newer version of the nf-core template is available.

Your pipeline is using an old version of the nf-core template: 3.3.2.
Please update your pipeline to the latest version.

For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation.

* update awk regex * update snapshot

* add prefix to singularity container for report * include gawk in gfa2fa env * include gawk in gfa2fa env

* add prefix to singularity container for report * include gawk in gfa2fa env * include gawk in gfa2fa env * mawk version * mawk version in stub * update CHANGELOG * Update CHANGELOG.md Co-authored-by: Simon Pearce <24893913+SPPearce@users.noreply.github.com> * [automated] Fix code linting --------- Co-authored-by: Simon Pearce <24893913+SPPearce@users.noreply.github.com> Co-authored-by: nf-core-bot <core@nf-co.re>

…ization

* Template update for nf-core/tools version 3.2.1 * Template update for nf-core/tools version 3.3.1 * merge template 3.3.1 - fix linting * update pre-commit * merge template 3.3.1 - fix linting * pre-commit config? * pre-commit config? * reinstall links * try larger runner * smaller run, disable bloom filter for hifiasm test * updated test snapshot * updated test snapshot * update nftignore * update nftignore * update nftignore * update nftignore * update nftignore * update nftignore * update nftignore * update nftignore * update nftignore * Update .github/actions/nf-test/action.yml Co-authored-by: Matthias Hörtenhuber <mashehu@users.noreply.github.com> * Update docs/output.md Co-authored-by: Matthias Hörtenhuber <mashehu@users.noreply.github.com> * remove .nf-test.log --------- Co-authored-by: Niklas Schandry <niklas@bio.lmu.de> Co-authored-by: Matthias Hörtenhuber <mashehu@users.noreply.github.com>

* add prefix to singularity container for report * add files exist check for references, closes nf-core#165

* Template update for nf-core/tools version 3.2.1 * Template update for nf-core/tools version 3.3.1 * merge template 3.3.1 - fix linting * update pre-commit * merge template 3.3.1 - fix linting * pre-commit config? * pre-commit config? * reinstall links * try larger runner * smaller run, disable bloom filter for hifiasm test * updated test snapshot * updated test snapshot * update nftignore * update nftignore * update nftignore * update nftignore * update nftignore * update nftignore * update nftignore * update nftignore * update nftignore * Update .github/actions/nf-test/action.yml Co-authored-by: Matthias Hörtenhuber <mashehu@users.noreply.github.com> * Update docs/output.md Co-authored-by: Matthias Hörtenhuber <mashehu@users.noreply.github.com> * remove .nf-test.log --------- Co-authored-by: Niklas Schandry <niklas@bio.lmu.de> Co-authored-by: Matthias Hörtenhuber <mashehu@users.noreply.github.com>

…ization

* Template update for nf-core/tools version 3.2.1 * Template update for nf-core/tools version 3.3.1 * merge template 3.3.1 - fix linting * update pre-commit * merge template 3.3.1 - fix linting * pre-commit config? * pre-commit config? * reinstall links * try larger runner * smaller run, disable bloom filter for hifiasm test * updated test snapshot * updated test snapshot * update nftignore * update nftignore * update nftignore * update nftignore * update nftignore * update nftignore * update nftignore * update nftignore * update nftignore * Update .github/actions/nf-test/action.yml Co-authored-by: Matthias Hörtenhuber <mashehu@users.noreply.github.com> * Update docs/output.md Co-authored-by: Matthias Hörtenhuber <mashehu@users.noreply.github.com> * remove .nf-test.log --------- Co-authored-by: Niklas Schandry <niklas@bio.lmu.de> Co-authored-by: Matthias Hörtenhuber <mashehu@users.noreply.github.com>

nschan · 2025-09-04T14:23:02Z

Since the initial start of this refactor, I noticed that when doing multiple assemblies from the same set of reads it is kind of a waste to send those reads through preprocessing multiple times. I also figured that assemblies from the same set of reads are likely to be compared to each other. To reduce redundant work and make comparisons easier, there is now a group parameters, that can be used to put samples using the same reads into a group. Note: putting samples with different inputs into the same group will give wrong results.
I also noticed over the course of refactoring that there is some information that simply needs to be passed to processes (mostly for config purposes), since fetching those values from params kind of defeats the idea of parameterising per-sample. Those get stuffed into meta as needed. Generally, I am storing all information in the channel-map during "transit" between processes. Overall, this results in a lot of not super-concise channel manipulation.
I have now run some more extensive tests, and overall the logic seems to work as intended.

nvnieuwk

Hi I've done my best but I found it pretty hard to read the code in this pipeline. so I can't approve this at this point... I've left a few comments. Here are some more tips to help with the readability:

Don't use it in closures, try to set a variable name for each item in the channel entry instead (e.g. instead of .map { it -> ...} do .map { meta, file1, file2 -> ...}. This makes it easier for me to understand what is in the channel at that point and will make it easier for future you (and others) to work on the pipeline later.
Use more clear variable names
Try to put some more comments above big code blocks with a short explanation of what this piece of code is for. (Especially on harder to understand pieces of code).

But anyways, I'm still really impressed with what you've done here and this really will be a massive improvement to the pipeline!

.nf-test.log

NOTE.md

nvnieuwk · 2025-09-16T13:46:10Z

docs/params.md

This file is not needed in nf-core pipelines, you can find all parameters on the website: https://nf-co.re/genomeassembler/dev/parameters/

nvnieuwk · 2025-09-16T13:47:20Z

docs/usage.md

+> [!NOTE]
+> The parameter names will be used in subsequent sections. Since all parameters can be provided per-sample or pipeline wide, no examples will be given.
+
+The list of all parameters that can be provided globally is available [here](params.md), parameters that can be set per sample are provided at the [end of this page](#sample-parameters).


Suggested change

The list of all parameters that can be provided globally is available [here](params.md), parameters that can be set per sample are provided at the [end of this page](#sample-parameters).

The list of all parameters that can be provided globally is available [here](https://nf-co.re/genomeassembler/parameters/), parameters that can be set per sample are provided at the [end of this page](#sample-parameters).

modules/local/collect_reads/main.nf

schema.md

subworkflows/local/assemble/main.nf

nschan and others added 25 commits May 22, 2025 15:49

add prefix to singularity container for report

e4eea9d

Merge branch 'nf-core:dev' into dev

e148a95

Merge branch 'nf-core:dev' into dev

51281c7

prepare pipeline initialization for sample-wise parameterization

5f9eb5b

refactor assemble and assemble subworkflows for sample-wise parameter…

e5e614f

…ization

checks, typos

26d09c1

Improve reference input check (nf-core#166)

7a454eb

* add prefix to singularity container for report * add files exist check for references, closes nf-core#165

refactor assemble and assemble subworkflows for sample-wise parameter…

4a8362a

…ization

intermediate WIP commit

b47e923

WIP commit

4d10d67

first attemp done

f5ce812

bugfix commit 1

5292f4c

bugfix commit 2

6f84424

bugfix commit 3

649094b

report modifications 1; add groups

3c12920

WIP commit

518d4c7

WIP commit

329d54a

bugfix commit, update configs for sample-wise args

33f7ff5

running up until including medaka

f462800

running up until including pilon

7ec611d

running up until including pilon (fixed)

3508bfb

running up until report, schema update

d705cf9

docs update

2ccba63

working pipeline (no report

ac46b92

nschan and others added 4 commits July 10, 2025 14:57

adding additional control for merqury, nanoq, for reporting; WIP

b80cc44

Awk regex (nf-core#167)

daf62ee

* update awk regex * update snapshot

add mawk to gfa2fa env (nf-core#168)

8ccd707

* add prefix to singularity container for report * include gawk in gfa2fa env * include gawk in gfa2fa env

nschan and others added 16 commits August 28, 2025 14:47

refactor assemble and assemble subworkflows for sample-wise parameter…

b241b83

…ization

less blocking

750ead3

Improve reference input check (nf-core#166)

35bc8ab

* add prefix to singularity container for report * add files exist check for references, closes nf-core#165

prepare pipeline initialization for sample-wise parameterization

2d5fd52

refactor assemble and assemble subworkflows for sample-wise parameter…

0fe8560

…ization

less blocking

35f7421

revert rebase

73e6d1a

some more conflicts

4a779ac

reverting changes

fb2e7b2

back to working state

e90a6a2

docs update

1a1d219

add check if longreads should go into fastplong

f229e77

bug collection

d44ebb9

nschan changed the title ~~Refactor for sample-wise parameterisation~~ v2.0.0: Refactor for sample-wise parameterisation Sep 4, 2025

remove jellyfish dump

b0418da

nschan added 2 commits September 11, 2025 09:50

report grouping fixes

6580030

no more plotly for merqury plots

4ff6da1

nvnieuwk reviewed Sep 16, 2025

View reviewed changes

nschan mentioned this pull request Sep 17, 2025

Per-sample assembly strategy & params? #160

Closed

nschan added 7 commits September 19, 2025 15:43

fix report grouping

756fe4b

add comments to main and assemble workflow

f1ec8f7

fix flye-flye scaffolding code

957c1ed

update report format, refactor document, document format

5a33b14

update assemble subworkflow: split flye assembly process by read type

9323e52

remove schema.md

18408c4

minor cleanup

0d22a9c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v2.0.0: Refactor for sample-wise parameterisation #171

v2.0.0: Refactor for sample-wise parameterisation #171

Uh oh!

nschan commented Jul 9, 2025

Uh oh!

nf-core-bot commented Jul 9, 2025 •

edited

Loading

Uh oh!

nschan commented Sep 4, 2025 •

edited

Loading

Uh oh!

nvnieuwk left a comment

Uh oh!

Uh oh!

Uh oh!

nvnieuwk Sep 16, 2025

Uh oh!

nvnieuwk Sep 16, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	The list of all parameters that can be provided globally is available [here](params.md), parameters that can be set per sample are provided at the [end of this page](#sample-parameters).
	The list of all parameters that can be provided globally is available [here](https://nf-co.re/genomeassembler/parameters/), parameters that can be set per sample are provided at the [end of this page](#sample-parameters).

v2.0.0: Refactor for sample-wise parameterisation #171

Are you sure you want to change the base?

v2.0.0: Refactor for sample-wise parameterisation #171

Uh oh!

Conversation

nschan commented Jul 9, 2025

Uh oh!

nf-core-bot commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nschan commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nvnieuwk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

nvnieuwk Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

nvnieuwk Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nf-core-bot commented Jul 9, 2025 •

edited

Loading

nschan commented Sep 4, 2025 •

edited

Loading