-
Notifications
You must be signed in to change notification settings - Fork 20
v2.0.0: Refactor for sample-wise parameterisation #171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
Conversation
* add prefix to singularity container for report * add files exist check for references, closes nf-core#165
* Template update for nf-core/tools version 3.2.1 * Template update for nf-core/tools version 3.3.1 * merge template 3.3.1 - fix linting * update pre-commit * merge template 3.3.1 - fix linting * pre-commit config? * pre-commit config? * reinstall links * try larger runner * smaller run, disable bloom filter for hifiasm test * updated test snapshot * updated test snapshot * update nftignore * update nftignore * update nftignore * update nftignore * update nftignore * update nftignore * update nftignore * update nftignore * update nftignore * Update .github/actions/nf-test/action.yml Co-authored-by: Matthias Hörtenhuber <mashehu@users.noreply.github.com> * Update docs/output.md Co-authored-by: Matthias Hörtenhuber <mashehu@users.noreply.github.com> * remove .nf-test.log --------- Co-authored-by: Niklas Schandry <niklas@bio.lmu.de> Co-authored-by: Matthias Hörtenhuber <mashehu@users.noreply.github.com>
|
Warning Newer version of the nf-core template is available. Your pipeline is using an old version of the nf-core template: 3.3.2. For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation. |
* update awk regex * update snapshot
* add prefix to singularity container for report * include gawk in gfa2fa env * include gawk in gfa2fa env
* add prefix to singularity container for report * include gawk in gfa2fa env * include gawk in gfa2fa env * mawk version * mawk version in stub * update CHANGELOG * Update CHANGELOG.md Co-authored-by: Simon Pearce <24893913+SPPearce@users.noreply.github.com> * [automated] Fix code linting --------- Co-authored-by: Simon Pearce <24893913+SPPearce@users.noreply.github.com> Co-authored-by: nf-core-bot <core@nf-co.re>
* Template update for nf-core/tools version 3.2.1 * Template update for nf-core/tools version 3.3.1 * merge template 3.3.1 - fix linting * update pre-commit * merge template 3.3.1 - fix linting * pre-commit config? * pre-commit config? * reinstall links * try larger runner * smaller run, disable bloom filter for hifiasm test * updated test snapshot * updated test snapshot * update nftignore * update nftignore * update nftignore * update nftignore * update nftignore * update nftignore * update nftignore * update nftignore * update nftignore * Update .github/actions/nf-test/action.yml Co-authored-by: Matthias Hörtenhuber <mashehu@users.noreply.github.com> * Update docs/output.md Co-authored-by: Matthias Hörtenhuber <mashehu@users.noreply.github.com> * remove .nf-test.log --------- Co-authored-by: Niklas Schandry <niklas@bio.lmu.de> Co-authored-by: Matthias Hörtenhuber <mashehu@users.noreply.github.com>
* add prefix to singularity container for report * add files exist check for references, closes nf-core#165
* Template update for nf-core/tools version 3.2.1 * Template update for nf-core/tools version 3.3.1 * merge template 3.3.1 - fix linting * update pre-commit * merge template 3.3.1 - fix linting * pre-commit config? * pre-commit config? * reinstall links * try larger runner * smaller run, disable bloom filter for hifiasm test * updated test snapshot * updated test snapshot * update nftignore * update nftignore * update nftignore * update nftignore * update nftignore * update nftignore * update nftignore * update nftignore * update nftignore * Update .github/actions/nf-test/action.yml Co-authored-by: Matthias Hörtenhuber <mashehu@users.noreply.github.com> * Update docs/output.md Co-authored-by: Matthias Hörtenhuber <mashehu@users.noreply.github.com> * remove .nf-test.log --------- Co-authored-by: Niklas Schandry <niklas@bio.lmu.de> Co-authored-by: Matthias Hörtenhuber <mashehu@users.noreply.github.com>
* Template update for nf-core/tools version 3.2.1 * Template update for nf-core/tools version 3.3.1 * merge template 3.3.1 - fix linting * update pre-commit * merge template 3.3.1 - fix linting * pre-commit config? * pre-commit config? * reinstall links * try larger runner * smaller run, disable bloom filter for hifiasm test * updated test snapshot * updated test snapshot * update nftignore * update nftignore * update nftignore * update nftignore * update nftignore * update nftignore * update nftignore * update nftignore * update nftignore * Update .github/actions/nf-test/action.yml Co-authored-by: Matthias Hörtenhuber <mashehu@users.noreply.github.com> * Update docs/output.md Co-authored-by: Matthias Hörtenhuber <mashehu@users.noreply.github.com> * remove .nf-test.log --------- Co-authored-by: Niklas Schandry <niklas@bio.lmu.de> Co-authored-by: Matthias Hörtenhuber <mashehu@users.noreply.github.com>
|
Since the initial start of this refactor, I noticed that when doing multiple assemblies from the same set of reads it is kind of a waste to send those reads through preprocessing multiple times. I also figured that assemblies from the same set of reads are likely to be compared to each other. To reduce redundant work and make comparisons easier, there is now a |
nvnieuwk
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi I've done my best but I found it pretty hard to read the code in this pipeline. so I can't approve this at this point... I've left a few comments. Here are some more tips to help with the readability:
- Don't use
itin closures, try to set a variable name for each item in the channel entry instead (e.g. instead of.map { it -> ...}do.map { meta, file1, file2 -> ...}. This makes it easier for me to understand what is in the channel at that point and will make it easier for future you (and others) to work on the pipeline later. - Use more clear variable names
- Try to put some more comments above big code blocks with a short explanation of what this piece of code is for. (Especially on harder to understand pieces of code).
But anyways, I'm still really impressed with what you've done here and this really will be a massive improvement to the pipeline!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file is not needed in nf-core pipelines, you can find all parameters on the website: https://nf-co.re/genomeassembler/dev/parameters/
| > [!NOTE] | ||
| > The parameter names will be used in subsequent sections. Since all parameters can be provided per-sample or pipeline wide, no examples will be given. | ||
|
|
||
| The list of all parameters that can be provided globally is available [here](params.md), parameters that can be set per sample are provided at the [end of this page](#sample-parameters). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| The list of all parameters that can be provided globally is available [here](params.md), parameters that can be set per sample are provided at the [end of this page](#sample-parameters). | |
| The list of all parameters that can be provided globally is available [here](https://nf-co.re/genomeassembler/parameters/), parameters that can be set per sample are provided at the [end of this page](#sample-parameters). |
As suggested here this is full refactor of
genomeassemblerto support sample-level parameterisation of everything.Currently, this PR contains the full pipeline, tested with stub runs of heterogenous samples in a single sample sheet.
Why?
Often when doing genome assembly, we do not know what works best. With this change, this pipeline can be used to compare different settings for the same set of reads, to compare the assembly outcome. Samples that share the same value in
groupwill be combined during reporting to facilitate comparisons of strategies on the same input(s). The report process / script will be updated to fit this new design (ongoing).Details
This was a bit more tricky than I had initially hoped. Essentially, all
paramsare stuffed into a main channel, which contains a map. I think a map is the only way to handle this channel safely, since sometimes entries are replaced and I am afraid that positional indexing would be too confusing (for me).This works fine, but channels containing maps cannot be joined. For this reason, a pattern that looks like:
map_channel_1 // Convert to list for join .map { it -> it.collect { entry -> [ entry.value, entry ] } } .join( map_channel_2 // Convert to list for join .map { it -> it.collect { entry -> [ entry.value, entry ] } } ) // After joining re-create the maps from the stored map .map { it -> it.collect { _entry, map -> [ (map.key): map.value ] }.collectEntries() }is used throughout to join map channels and recover the map after joining.
Generally, to facilitate asynchronous movement
mix()is used andjoin()is (hopefully) used considerately to avoid blocking resulting from waiting for processes.The overall sample-wise parameterisation is offloaded to
subworkflows/local/utils_nfcore_genomeassembler_pipeline/main.nf; currently this does not produce errors (even though it should). This should be a minor fix.This also does some validation, and consolidates conflicts that may arise from
paramsthat are incompatible with certain samples, e.g.medakacannot be used if there are no ONT reads.Currently, there are no tests included since I would like to get some feedback on whether this is at all reasonable, or if there would be better ways to do things.
I have tested this with a samplesheet that looks like:
In combination with different params (e.g.
--polish_pilon,--scaffold_longstitch, etc) in stub runs, so I think the overall logic is fine.