Description
Spun off from #2723
Params can currently be defined in config files (including profiles), params files, CLI options, and the pipeline code itself. This creates the potential for much confusion around how these various sources are resolved (see #2662). Additionally, params are not typed, and while the CLI can cast command line params based on regular expressions, it can also backfire when e.g. a string param is given a value that "looks" like a number.
Instead, params should be defined in a single place with metadata such as type, default value, description, etc. Benefits are:
- single source of truth
- less ambiguity of how params are resolved
- ability to validate params based on type definition instead of regex guessing
The nf-core parameter schema (nextflow_schema.json
) as well as the nf-validation plugin are excellent steps in this direction, and the solution may be to simply incorporate them into Nextflow.
For backwards compatibility, we may allow params to be set in config files and pipeline code, but this would essentially be overriding the default value rather than "defining" the param, and it should be discouraged in favor of putting everything in the parameter schema. That being said, it can be useful to set params from a profile, such as a test profile that provides some test data, so this use case should be supported.
The main question that I see is whether the schema should be in a separate JSON/YAML file (as it currently is in nf-core) or in the pipeline code as part of the top-level workflow definition.
- I would like the latter approach because it makes the top-level workflow more of a "unit" and makes it easier for IDE tooling to validate param references in the pipeline code. It would likely be less verbose than a JSON schema.
- On the other hand, a JSON schema can be parsed by external tools written in other languages whereas Nextflow scripts can only be parsed by Nextflow (and any IDE tooling)
- For what it's worth, Nextflow could export the workflow inputs definition to a JSON file for use with other tools, but then we have to keep it in sync with the pipeline code somehow
Activity
jspaezp commentedon Feb 12, 2024
Hello there! I wanted to bring this project to the attention!
https://pkl-lang.org
It was released by apple so I will have some support and I feel like it addresses really well the needs for schema validation and progressive amendment (+java library).
let me know what you think!
stevekm commentedon Mar 5, 2024
#2723 (comment)
I think its common to keep the params bundled with the config profiles; a lot of params are ultimately just paths to reference files, and you will need different paths to the files if you are on HPC vs cloud.
It seems like this would break that.
fwiw so far the typing of "default values" for the params has been one of the recurring headaches I have had lately, need to have a way to support
its been my experience that
nextflow_schema.json
has severe issues especially with the latter two. Here is a common example;I have an R script where
"NA"
is a recognized "unset" cli arg. I want my users to be able to pass in an arg from the Nextflow pipelineparams
. If the user submits an arg, it must be an int value. However if a value is not passed, I need to pass"NA"
to the script instead, which is a string. If myparams.Rscript_val
default isnull
, it does not pass to the R script correctly and I have to hack in Groovy to fix it. If myparams.Rscript_val
default is"NA"
, it works, but thenextflow_schema.json
does not support it because, due to the user-input requirement of onlyint
, I can only express the compatible input value in terms ofint
fromnextflow_schema.json
this is just one example but it highlights a relatively common type of situation; the limitations with
nextflow_schema.json
have also bled over into things like Nextflow Tower / Seqera platform which iirc use it for parsing the input fields for the pipeline run UI.similarly, trying to have e.g. SLURM default paths vs. AWS S3 defaults paths, based on profile, supported in the
nextflow_schema.json
, is something I still have not figured out.bentsherman commentedon Mar 5, 2024
What I've been thinking is that the
nextflow_schema.json
should be the source of truth instead of the config file, but config profiles should still be able to override the default value. I think that would give the best of both worls.As for your R example, I think the best practice is to encode that convention in the process that calls the R script like so:
[-]Workflow inputs (a.k.a. params) schema[/-][+]Workflow inputs definition and schema[/+]bentsherman commentedon Mar 24, 2024
Thinking more on this, and with some inspiration from the output DSL prototype, here's a sketch for an input DSL for fetchngs:
Notes:
input
) can be used to validate structure of an input fileschema
option tosplitCsv
params
but only allowed in anonymous workflow and output DSLtake:
blockI really like the input DSL, but the circular dependency with the config is a problem. I have listed a few ideas to address this, though none of them are complete IMO. Maybe some combination of them will do the trick.
Need to think further on the relationship between params, config, and script
bentsherman commentedon Jun 28, 2024
One way to solve the circular dependency might be to restrict the scope of params to only things that are actually workflow inputs. In other words, don't allow the config to reference params at all. Then you could define params in the pipeline code (like the output definition) and generate a YAML schema for use by external tools.
The config file should still be able to set params. Nextflow would be able to validate them at runtime because it could evaluate the params definition before it evaluates the entry workflow and output definition (the only two places where params can be used).
The params that are typically used in the config file tend to be external to the workflow itself, for example:
outdir
,publish_dir_mode
: should become config settings e.g.workflow.output.directory
andworkflow.output.mode
max_cpus
,max_memory
,max_time
: should be replaced byresourceLabels
directiveThe main consequence is that you would only be able to use params to control workflow inputs and not config settings. Things that might previously be an additional CLI option:
$ nextflow run nf-core/rnaseq --max_cpus 24
Would become config:
I think I life this tradeoff, though I can appreciate why many people might like the power and convenience of params as they currently work. Maybe this would be a good long-term goal to work towards. First we focus on incorporating the param schema, then we can think about adding a params definition alongside the entry workflow.
stale commentedon Apr 26, 2025
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.