-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: make enum listings reproducible and improve formatting #49
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
I'm rebuilding RPM packages to check build reproducibility (https://docs.fedoraproject.org/en-US/reproducible-builds/), and we got the result that rebuilds of the snakemake man page varies in this trivial way: │ │ │ \fB\-\-rerun\-trigger\fR mtime'. (default: │ │ │ -frozenset({<RerunTrigger.INPUT: 2>, │ │ │ -<RerunTrigger.MTIME: 0>, <RerunTrigger.SOFTWARE_ENV: │ │ │ -3>, <RerunTrigger.PARAMS: 1>, <RerunTrigger.CODE: │ │ │ -4>})) │ │ │ +frozenset({<RerunTrigger.CODE: 4>, │ │ │ +<RerunTrigger.PARAMS: 1>, <RerunTrigger.INPUT: 2>, │ │ │ +<RerunTrigger.SOFTWARE_ENV: 3>, <RerunTrigger.MTIME: │ │ │ +0>})) sets/frozensets are hash-ordered, not insertion ordered, so the order is not reproducible. Also, the automatic formatting in --help looks bad, the fact that it's a frozenset is not interesting to the user. If we drop the explicit sorting, the items are sorted in the definition order, which seems reasonable. --help text with the patch: --rerun-triggers {mtime,params,input,software-env,code} [{mtime,params,input,software-env,code} ...] Define what triggers the rerunning of a job. By default, all triggers are used, which guarantees that results are consistent with the workflow code and configuration. If you rather prefer the traditional way of just considering file modification dates, use ' --rerun-trigger mtime'. (default: (<RerunTrigger.MTIME: 0>, <RerunTrigger.PARAMS: 1>, <RerunTrigger.INPUT: 2>, <RerunTrigger.SOFTWARE_ENV: 3>, <RerunTrigger.CODE: 4>))
keszybz
added a commit
to keszybz/snakemake
that referenced
this pull request
Jun 30, 2024
I'm rebuilding RPM packages to check build reproducibility (https://docs.fedoraproject.org/en-US/reproducible-builds/), and we got the result that rebuilds of the snakemake man page varies in this trivial way: │ │ │ -<dataclasses._MISSING_TYPE object at 0x3ff87071a90>) │ │ │ +<dataclasses._MISSING_TYPE object at 0xffff819c5a90>) I think that printing those values is not useful at all (apart from created a problem for reproducible builds efforts), because those values are not interesting at all for the user. So change the argument parser definition to suppress all "boring" values like "set()", "None", "<dataclasses._MISSING_TYPE object at 0x…". Also, suppress the printing of the default for local cores. It depends on the local machine configuration, but it also makes the user think that the value printed is a fixed default. │ │ │ \fB\-\-local\-cores\fR N │ │ │ In cluster/cloud mode, use at most N cores of the host │ │ │ machine in parallel (default: number of CPU cores of │ │ │ the host). The cores are used to execute local rules. │ │ │ This option is ignored when not in cluster/cloud mode. │ │ │ -(default: 3) │ │ │ +(default: 64) This is the total diff of output with the patch: $ diff -U0 <(PYTHONPATH=~/python/snakemake-interface-common snakemake --help) \ <(PYTHONPATH=~/python/snakemake-interface-common:~/python/snakemake snakemake --help) --- /proc/self/fd/11 2024-06-30 17:15:56.388584254 +0200 +++ /proc/self/fd/12 2024-06-30 17:15:56.393584263 +0200 @@ -93,2 +93 @@ - targets Targets to build. May be rules or files. (default: - set()) + targets Targets to build. May be rules or files. @@ -114 +113 @@ - SNAKEMAKE_PROFILE] (default: None) + SNAKEMAKE_PROFILE] @@ -137 +136 @@ - myrule=4'. (default: None) + myrule=4'. @@ -146 +145 @@ - containers) needed to create them. (default: None) + containers) needed to create them. @@ -155 +153,0 @@ - (default: None) @@ -164 +162 @@ - workflow.cores. (default: None) + workflow.cores. @@ -169 +167 @@ - jobs. (default: None) + jobs. @@ -174 +172 @@ - (default: 8) + (default: <available CPU count>) @@ -203 +201 @@ - with --set-threads. (default: None) + with --set-threads. @@ -255 +253 @@ - outlined above. (default: None) + outlined above. @@ -263 +261 @@ - be preemptible. (default: None) + be preemptible. @@ -268 +265,0 @@ - (default: None) @@ -284 +281 @@ - Documentation). (default: None) + Documentation). @@ -286,2 +283 @@ - Environment variables to pass to cloud jobs. (default: - set()) + Environment variables to pass to cloud jobs. @@ -290,2 +286 @@ - snakefile will use this as their origin). (default: - None) + snakefile will use this as their origin). @@ -321 +316 @@ - plugin: snakemake_executor_<name> (default: None) + plugin: snakemake_executor_<name> @@ -329 +323,0 @@ - (default: set()) @@ -333 +326,0 @@ - (default: set()) @@ -345 +338 @@ - samples. (default: None) + samples. @@ -350 +342,0 @@ - (default: set()) @@ -356 +348 @@ - files specified here. (default: set()) + files specified here. @@ -363 +355 @@ - directory. (default: None) + directory. @@ -374 +366 @@ - WMS_MONITOR_TOKEN in the environment. (default: None) + WMS_MONITOR_TOKEN in the environment. @@ -381,2 +373 @@ - endpoint to first interact with the workflow (default: - None) + endpoint to first interact with the workflow @@ -388 +379 @@ - (internal use only). (default: None) + (internal use only). @@ -391 +382 @@ - activate) (internal use only). (default: None) + activate) (internal use only). @@ -398 +389 @@ - remote compute node. (default: None) + remote compute node. @@ -420 +411 @@ - report.html is the default. (default: None) + report.html is the default. @@ -424 +415 @@ - custom logo, see docs. (default: None) + custom logo, see docs. @@ -428 +419 @@ - starting with --report-. (default: None) + starting with --report-. @@ -437 +428 @@ - Snakemake for subsequent jobs. (default: None) + Snakemake for subsequent jobs. @@ -447,2 +438 @@ - present, this will create an empty draft. (default: - None) + present, this will create an empty draft. @@ -459 +449 @@ - is used. (default: None) + is used. @@ -468 +458 @@ - 'pytest TESTPATH'. (default: None) + 'pytest TESTPATH'. @@ -473 +462,0 @@ - (default: None) @@ -543 +532 @@ - .tar.bz2 and .tar.xz. (default: None) + .tar.bz2 and .tar.xz. @@ -547 +536 @@ - marks that files are incomplete. (default: None) + marks that files are incomplete. @@ -558,2 +547 @@ - input, params) have changed since creation. (default: - None) + input, params) have changed since creation. @@ -593 +581 @@ - use only. (default: None) + use only. @@ -607 +595 @@ - being printed at all. (default: None) + being printed at all. @@ -643 +631 @@ - cluster environments. (default: None) + cluster environments. @@ -648,2 +636 @@ - too long to be passed on the commandline. (default: - None) + too long to be passed on the commandline. @@ -664 +650,0 @@ - (default: frozenset()) @@ -675 +661 @@ - results otherwise. (default: None) + results otherwise. @@ -679,2 +665 @@ - for internal use by Snakemake itself only. (default: - set()) + for internal use by Snakemake itself only. @@ -713 +698 @@ - used. (default: None) + used. @@ -716 +701 @@ - bucket name. (default: ) + bucket name. @@ -735 +719,0 @@ - (default: None) @@ -761 +744,0 @@ - (default: None) @@ -768 +751 @@ - requires yappi to be installed. (default: None) + requires yappi to be installed. @@ -779 +762 @@ - e.g. slack messages or emails. (default: None) + e.g. slack messages or emails. @@ -784 +767 @@ - management system (wms) are supported. (default: None) + management system (wms) are supported. @@ -817 +800 @@ - in the installation directory. (default: None) + in the installation directory. @@ -834 +816,0 @@ - (default: set()) @@ -863,2 +845 @@ - software-deployment (see --shared-fs-usage). (default: - None) + software-deployment (see --shared-fs-usage). @@ -872 +853 @@ - tarballs. (default: None) + tarballs. @@ -902 +883 @@ - usage). (default: None) + usage). @@ -905 +885,0 @@ - (default: ) @@ -919,2 +899 @@ - directories with htmlindex as results. (default: - <dataclasses._MISSING_TYPE object at 0x7fc51653e030>) + directories with htmlindex as results. @@ -922,2 +901 @@ - Path to a custom stylesheet for the report. (default: - <dataclasses._MISSING_TYPE object at 0x7fc51653e030>) + Path to a custom stylesheet for the report. snakemake/snakemake-interface-common#49 is a related change. Together, they should make the --help output reproducible.
Merged
2 tasks
johanneskoester
added a commit
to snakemake/snakemake
that referenced
this pull request
Jul 4, 2024
…-help (#2936) I'm rebuilding RPM packages to check build reproducibility (https://docs.fedoraproject.org/en-US/reproducible-builds/), and we got the result that rebuilds of the snakemake man page varies in this trivial way: ``` │ │ │ -<dataclasses._MISSING_TYPE object at 0x3ff87071a90>) │ │ │ +<dataclasses._MISSING_TYPE object at 0xffff819c5a90>) ``` I think that printing those values is not useful at all (apart from created a problem for reproducible builds efforts), because those values are not interesting at all for the user. So change the argument parser definition to suppress all "boring" values like "set()", "None", "<dataclasses._MISSING_TYPE object at 0x…". Also, suppress the printing of the default for local cores. It depends on the local machine configuration, but it also makes the user think that the value printed is a fixed default. ``` │ │ │ \fB\-\-local\-cores\fR N │ │ │ In cluster/cloud mode, use at most N cores of the host │ │ │ machine in parallel (default: number of CPU cores of │ │ │ the host). The cores are used to execute local rules. │ │ │ This option is ignored when not in cluster/cloud mode. │ │ │ -(default: 3) │ │ │ +(default: 64) ``` This is the total diff of output with the patch: ```console $ diff -U0 <(PYTHONPATH=~/python/snakemake-interface-common snakemake --help) \ <(PYTHONPATH=~/python/snakemake-interface-common:~/python/snakemake snakemake --help) --- /proc/self/fd/11 2024-06-30 17:15:56.388584254 +0200 +++ /proc/self/fd/12 2024-06-30 17:15:56.393584263 +0200 @@ -93,2 +93 @@ - targets Targets to build. May be rules or files. (default: - set()) + targets Targets to build. May be rules or files. @@ -114 +113 @@ - SNAKEMAKE_PROFILE] (default: None) + SNAKEMAKE_PROFILE] @@ -137 +136 @@ - myrule=4'. (default: None) + myrule=4'. @@ -146 +145 @@ - containers) needed to create them. (default: None) + containers) needed to create them. @@ -155 +153,0 @@ - (default: None) @@ -164 +162 @@ - workflow.cores. (default: None) + workflow.cores. @@ -169 +167 @@ - jobs. (default: None) + jobs. @@ -174 +172 @@ - (default: 8) + (default: <available CPU count>) @@ -203 +201 @@ - with --set-threads. (default: None) + with --set-threads. @@ -255 +253 @@ - outlined above. (default: None) + outlined above. @@ -263 +261 @@ - be preemptible. (default: None) + be preemptible. @@ -268 +265,0 @@ - (default: None) @@ -284 +281 @@ - Documentation). (default: None) + Documentation). @@ -286,2 +283 @@ - Environment variables to pass to cloud jobs. (default: - set()) + Environment variables to pass to cloud jobs. @@ -290,2 +286 @@ - snakefile will use this as their origin). (default: - None) + snakefile will use this as their origin). @@ -321 +316 @@ - plugin: snakemake_executor_<name> (default: None) + plugin: snakemake_executor_<name> @@ -329 +323,0 @@ - (default: set()) @@ -333 +326,0 @@ - (default: set()) @@ -345 +338 @@ - samples. (default: None) + samples. @@ -350 +342,0 @@ - (default: set()) @@ -356 +348 @@ - files specified here. (default: set()) + files specified here. @@ -363 +355 @@ - directory. (default: None) + directory. @@ -374 +366 @@ - WMS_MONITOR_TOKEN in the environment. (default: None) + WMS_MONITOR_TOKEN in the environment. @@ -381,2 +373 @@ - endpoint to first interact with the workflow (default: - None) + endpoint to first interact with the workflow @@ -388 +379 @@ - (internal use only). (default: None) + (internal use only). @@ -391 +382 @@ - activate) (internal use only). (default: None) + activate) (internal use only). @@ -398 +389 @@ - remote compute node. (default: None) + remote compute node. @@ -420 +411 @@ - report.html is the default. (default: None) + report.html is the default. @@ -424 +415 @@ - custom logo, see docs. (default: None) + custom logo, see docs. @@ -428 +419 @@ - starting with --report-. (default: None) + starting with --report-. @@ -437 +428 @@ - Snakemake for subsequent jobs. (default: None) + Snakemake for subsequent jobs. @@ -447,2 +438 @@ - present, this will create an empty draft. (default: - None) + present, this will create an empty draft. @@ -459 +449 @@ - is used. (default: None) + is used. @@ -468 +458 @@ - 'pytest TESTPATH'. (default: None) + 'pytest TESTPATH'. @@ -473 +462,0 @@ - (default: None) @@ -543 +532 @@ - .tar.bz2 and .tar.xz. (default: None) + .tar.bz2 and .tar.xz. @@ -547 +536 @@ - marks that files are incomplete. (default: None) + marks that files are incomplete. @@ -558,2 +547 @@ - input, params) have changed since creation. (default: - None) + input, params) have changed since creation. @@ -593 +581 @@ - use only. (default: None) + use only. @@ -607 +595 @@ - being printed at all. (default: None) + being printed at all. @@ -643 +631 @@ - cluster environments. (default: None) + cluster environments. @@ -648,2 +636 @@ - too long to be passed on the commandline. (default: - None) + too long to be passed on the commandline. @@ -664 +650,0 @@ - (default: frozenset()) @@ -675 +661 @@ - results otherwise. (default: None) + results otherwise. @@ -679,2 +665 @@ - for internal use by Snakemake itself only. (default: - set()) + for internal use by Snakemake itself only. @@ -713 +698 @@ - used. (default: None) + used. @@ -716 +701 @@ - bucket name. (default: ) + bucket name. @@ -735 +719,0 @@ - (default: None) @@ -761 +744,0 @@ - (default: None) @@ -768 +751 @@ - requires yappi to be installed. (default: None) + requires yappi to be installed. @@ -779 +762 @@ - e.g. slack messages or emails. (default: None) + e.g. slack messages or emails. @@ -784 +767 @@ - management system (wms) are supported. (default: None) + management system (wms) are supported. @@ -817 +800 @@ - in the installation directory. (default: None) + in the installation directory. @@ -834 +816,0 @@ - (default: set()) @@ -863,2 +845 @@ - software-deployment (see --shared-fs-usage). (default: - None) + software-deployment (see --shared-fs-usage). @@ -872 +853 @@ - tarballs. (default: None) + tarballs. @@ -902 +883 @@ - usage). (default: None) + usage). @@ -905 +885,0 @@ - (default: ) @@ -919,2 +899 @@ - directories with htmlindex as results. (default: - <dataclasses._MISSING_TYPE object at 0x7fc51653e030>) + directories with htmlindex as results. @@ -922,2 +901 @@ - Path to a custom stylesheet for the report. (default: - <dataclasses._MISSING_TYPE object at 0x7fc51653e030>) + Path to a custom stylesheet for the report. ``` snakemake/snakemake-interface-common#49 is a related change. Together, they should make the --help output reproducible. <!--Add a description of your PR here--> ### QC <!-- Make sure that you can tick the boxes below. --> * [ ] The PR contains a test case for the changes or the changes are already covered by an existing test case. * [ ] The documentation (`docs/`) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake). --------- Co-authored-by: Johannes Köster <johannes.koester@tu-dortmund.de> Co-authored-by: Johannes Koester <johannes.koester@uni-due.de>
Thanks for pointing us to this problem! I have now fixed this in a slightly different way. Instead of changing the type, we format the help text properly and deterministically: snakemake/snakemake#3081. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I'm rebuilding RPM packages to check build reproducibility (https://docs.fedoraproject.org/en-US/reproducible-builds/), and we got the result that rebuilds of the snakemake man page varies in this trivial way:
│ │ │ \fB--rerun-trigger\fR mtime'. (default:
│ │ │ -frozenset({<RerunTrigger.INPUT: 2>,
│ │ │ -<RerunTrigger.MTIME: 0>, <RerunTrigger.SOFTWARE_ENV: │ │ │ -3>, <RerunTrigger.PARAMS: 1>, <RerunTrigger.CODE: │ │ │ -4>}))
│ │ │ +frozenset({<RerunTrigger.CODE: 4>,
│ │ │ +<RerunTrigger.PARAMS: 1>, <RerunTrigger.INPUT: 2>, │ │ │ +<RerunTrigger.SOFTWARE_ENV: 3>, <RerunTrigger.MTIME: │ │ │ +0>}))
sets/frozensets are hash-ordered, not insertion ordered, so the order is not reproducible. Also, the automatic formatting in --help looks bad, the fact that it's a frozenset is not interesting to the user.
If we drop the explicit sorting, the items are sorted in the definition order, which seems reasonable.
--help text with the patch:
--rerun-triggers {mtime,params,input,software-env,code} [{mtime,params,input,software-env,code} ...]
Define what triggers the rerunning of a job. By
default, all triggers are used, which guarantees that
results are consistent with the workflow code and
configuration. If you rather prefer the traditional
way of just considering file modification dates, use '
--rerun-trigger mtime'. (default:
(<RerunTrigger.MTIME: 0>, <RerunTrigger.PARAMS: 1>,
<RerunTrigger.INPUT: 2>, <RerunTrigger.SOFTWARE_ENV:
3>, <RerunTrigger.CODE: 4>))