Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: make enum listings reproducible and improve formatting #49

Closed
wants to merge 1 commit into from

Conversation

keszybz
Copy link

@keszybz keszybz commented Jun 30, 2024

I'm rebuilding RPM packages to check build reproducibility (https://docs.fedoraproject.org/en-US/reproducible-builds/), and we got the result that rebuilds of the snakemake man page varies in this trivial way:

│ │ │ \fB--rerun-trigger\fR mtime'. (default:
│ │ │ -frozenset({<RerunTrigger.INPUT: 2>,
│ │ │ -<RerunTrigger.MTIME: 0>, <RerunTrigger.SOFTWARE_ENV: │ │ │ -3>, <RerunTrigger.PARAMS: 1>, <RerunTrigger.CODE: │ │ │ -4>}))
│ │ │ +frozenset({<RerunTrigger.CODE: 4>,
│ │ │ +<RerunTrigger.PARAMS: 1>, <RerunTrigger.INPUT: 2>, │ │ │ +<RerunTrigger.SOFTWARE_ENV: 3>, <RerunTrigger.MTIME: │ │ │ +0>}))

sets/frozensets are hash-ordered, not insertion ordered, so the order is not reproducible. Also, the automatic formatting in --help looks bad, the fact that it's a frozenset is not interesting to the user.

If we drop the explicit sorting, the items are sorted in the definition order, which seems reasonable.

--help text with the patch:
--rerun-triggers {mtime,params,input,software-env,code} [{mtime,params,input,software-env,code} ...]
Define what triggers the rerunning of a job. By
default, all triggers are used, which guarantees that
results are consistent with the workflow code and
configuration. If you rather prefer the traditional
way of just considering file modification dates, use '
--rerun-trigger mtime'. (default:
(<RerunTrigger.MTIME: 0>, <RerunTrigger.PARAMS: 1>,
<RerunTrigger.INPUT: 2>, <RerunTrigger.SOFTWARE_ENV:
3>, <RerunTrigger.CODE: 4>))

I'm rebuilding RPM packages to check build reproducibility
(https://docs.fedoraproject.org/en-US/reproducible-builds/), and we
got the result that rebuilds of the snakemake man page varies in this
trivial way:

│ │ │  \fB\-\-rerun\-trigger\fR mtime'. (default:
│ │ │ -frozenset({<RerunTrigger.INPUT: 2>,
│ │ │ -<RerunTrigger.MTIME: 0>, <RerunTrigger.SOFTWARE_ENV:
│ │ │ -3>, <RerunTrigger.PARAMS: 1>, <RerunTrigger.CODE:
│ │ │ -4>}))
│ │ │ +frozenset({<RerunTrigger.CODE: 4>,
│ │ │ +<RerunTrigger.PARAMS: 1>, <RerunTrigger.INPUT: 2>,
│ │ │ +<RerunTrigger.SOFTWARE_ENV: 3>, <RerunTrigger.MTIME:
│ │ │ +0>}))

sets/frozensets are hash-ordered, not insertion ordered, so the order
is not reproducible. Also, the automatic formatting in --help looks
bad, the fact that it's a frozenset is not interesting to the user.

If we drop the explicit sorting, the items are sorted in the
definition order, which seems reasonable.

--help text with the patch:
  --rerun-triggers {mtime,params,input,software-env,code} [{mtime,params,input,software-env,code} ...]
                        Define what triggers the rerunning of a job. By
                        default, all triggers are used, which guarantees that
                        results are consistent with the workflow code and
                        configuration. If you rather prefer the traditional
                        way of just considering file modification dates, use '
                        --rerun-trigger mtime'. (default:
                        (<RerunTrigger.MTIME: 0>, <RerunTrigger.PARAMS: 1>,
                        <RerunTrigger.INPUT: 2>, <RerunTrigger.SOFTWARE_ENV:
                        3>, <RerunTrigger.CODE: 4>))
keszybz added a commit to keszybz/snakemake that referenced this pull request Jun 30, 2024
I'm rebuilding RPM packages to check build reproducibility
(https://docs.fedoraproject.org/en-US/reproducible-builds/), and we
got the result that rebuilds of the snakemake man page varies in this
trivial way:

│ │ │ -<dataclasses._MISSING_TYPE object at 0x3ff87071a90>)
│ │ │ +<dataclasses._MISSING_TYPE object at 0xffff819c5a90>)

I think that printing those values is not useful at all (apart from
created a problem for reproducible builds efforts), because those
values are not interesting at all for the user. So change the argument
parser definition to suppress all "boring" values like "set()",
"None", "<dataclasses._MISSING_TYPE object at 0x…".

Also, suppress the printing of the default for local cores. It depends
on the local machine configuration, but it also makes the user think
that the value printed is a fixed default.

│ │ │  \fB\-\-local\-cores\fR N
│ │ │  In cluster/cloud mode, use at most N cores of the host
│ │ │  machine in parallel (default: number of CPU cores of
│ │ │  the host). The cores are used to execute local rules.
│ │ │  This option is ignored when not in cluster/cloud mode.
│ │ │ -(default: 3)
│ │ │ +(default: 64)

This is the total diff of output with the patch:

$ diff -U0 <(PYTHONPATH=~/python/snakemake-interface-common snakemake --help) \
           <(PYTHONPATH=~/python/snakemake-interface-common:~/python/snakemake snakemake --help)

--- /proc/self/fd/11	2024-06-30 17:15:56.388584254 +0200
+++ /proc/self/fd/12	2024-06-30 17:15:56.393584263 +0200
@@ -93,2 +93 @@
-  targets               Targets to build. May be rules or files. (default:
-                        set())
+  targets               Targets to build. May be rules or files.
@@ -114 +113 @@
-                        SNAKEMAKE_PROFILE] (default: None)
+                        SNAKEMAKE_PROFILE]
@@ -137 +136 @@
-                        myrule=4'. (default: None)
+                        myrule=4'.
@@ -146 +145 @@
-                        containers) needed to create them. (default: None)
+                        containers) needed to create them.
@@ -155 +153,0 @@
-                        (default: None)
@@ -164 +162 @@
-                        workflow.cores. (default: None)
+                        workflow.cores.
@@ -169 +167 @@
-                        jobs. (default: None)
+                        jobs.
@@ -174 +172 @@
-                        (default: 8)
+                        (default: <available CPU count>)
@@ -203 +201 @@
-                        with --set-threads. (default: None)
+                        with --set-threads.
@@ -255 +253 @@
-                        outlined above. (default: None)
+                        outlined above.
@@ -263 +261 @@
-                        be preemptible. (default: None)
+                        be preemptible.
@@ -268 +265,0 @@
-                        (default: None)
@@ -284 +281 @@
-                        Documentation). (default: None)
+                        Documentation).
@@ -286,2 +283 @@
-                        Environment variables to pass to cloud jobs. (default:
-                        set())
+                        Environment variables to pass to cloud jobs.
@@ -290,2 +286 @@
-                        snakefile will use this as their origin). (default:
-                        None)
+                        snakefile will use this as their origin).
@@ -321 +316 @@
-                        plugin: snakemake_executor_<name> (default: None)
+                        plugin: snakemake_executor_<name>
@@ -329 +323,0 @@
-                        (default: set())
@@ -333 +326,0 @@
-                        (default: set())
@@ -345 +338 @@
-                        samples. (default: None)
+                        samples.
@@ -350 +342,0 @@
-                        (default: set())
@@ -356 +348 @@
-                        files specified here. (default: set())
+                        files specified here.
@@ -363 +355 @@
-                        directory. (default: None)
+                        directory.
@@ -374 +366 @@
-                        WMS_MONITOR_TOKEN in the environment. (default: None)
+                        WMS_MONITOR_TOKEN in the environment.
@@ -381,2 +373 @@
-                        endpoint to first interact with the workflow (default:
-                        None)
+                        endpoint to first interact with the workflow
@@ -388 +379 @@
-                        (internal use only). (default: None)
+                        (internal use only).
@@ -391 +382 @@
-                        activate) (internal use only). (default: None)
+                        activate) (internal use only).
@@ -398 +389 @@
-                        remote compute node. (default: None)
+                        remote compute node.
@@ -420 +411 @@
-                        report.html is the default. (default: None)
+                        report.html is the default.
@@ -424 +415 @@
-                        custom logo, see docs. (default: None)
+                        custom logo, see docs.
@@ -428 +419 @@
-                        starting with --report-. (default: None)
+                        starting with --report-.
@@ -437 +428 @@
-                        Snakemake for subsequent jobs. (default: None)
+                        Snakemake for subsequent jobs.
@@ -447,2 +438 @@
-                        present, this will create an empty draft. (default:
-                        None)
+                        present, this will create an empty draft.
@@ -459 +449 @@
-                        is used. (default: None)
+                        is used.
@@ -468 +458 @@
-                        'pytest TESTPATH'. (default: None)
+                        'pytest TESTPATH'.
@@ -473 +462,0 @@
-                        (default: None)
@@ -543 +532 @@
-                        .tar.bz2 and .tar.xz. (default: None)
+                        .tar.bz2 and .tar.xz.
@@ -547 +536 @@
-                        marks that files are incomplete. (default: None)
+                        marks that files are incomplete.
@@ -558,2 +547 @@
-                        input, params) have changed since creation. (default:
-                        None)
+                        input, params) have changed since creation.
@@ -593 +581 @@
-                        use only. (default: None)
+                        use only.
@@ -607 +595 @@
-                        being printed at all. (default: None)
+                        being printed at all.
@@ -643 +631 @@
-                        cluster environments. (default: None)
+                        cluster environments.
@@ -648,2 +636 @@
-                        too long to be passed on the commandline. (default:
-                        None)
+                        too long to be passed on the commandline.
@@ -664 +650,0 @@
-                        (default: frozenset())
@@ -675 +661 @@
-                        results otherwise. (default: None)
+                        results otherwise.
@@ -679,2 +665 @@
-                        for internal use by Snakemake itself only. (default:
-                        set())
+                        for internal use by Snakemake itself only.
@@ -713 +698 @@
-                        used. (default: None)
+                        used.
@@ -716 +701 @@
-                        bucket name. (default: )
+                        bucket name.
@@ -735 +719,0 @@
-                        (default: None)
@@ -761 +744,0 @@
-                        (default: None)
@@ -768 +751 @@
-                        requires yappi to be installed. (default: None)
+                        requires yappi to be installed.
@@ -779 +762 @@
-                        e.g. slack messages or emails. (default: None)
+                        e.g. slack messages or emails.
@@ -784 +767 @@
-                        management system (wms) are supported. (default: None)
+                        management system (wms) are supported.
@@ -817 +800 @@
-                        in the installation directory. (default: None)
+                        in the installation directory.
@@ -834 +816,0 @@
-                        (default: set())
@@ -863,2 +845 @@
-                        software-deployment (see --shared-fs-usage). (default:
-                        None)
+                        software-deployment (see --shared-fs-usage).
@@ -872 +853 @@
-                        tarballs. (default: None)
+                        tarballs.
@@ -902 +883 @@
-                        usage). (default: None)
+                        usage).
@@ -905 +885,0 @@
-                        (default: )
@@ -919,2 +899 @@
-                        directories with htmlindex as results. (default:
-                        <dataclasses._MISSING_TYPE object at 0x7fc51653e030>)
+                        directories with htmlindex as results.
@@ -922,2 +901 @@
-                        Path to a custom stylesheet for the report. (default:
-                        <dataclasses._MISSING_TYPE object at 0x7fc51653e030>)
+                        Path to a custom stylesheet for the report.

snakemake/snakemake-interface-common#49 is a related
change. Together, they should make the --help output reproducible.
johanneskoester added a commit to snakemake/snakemake that referenced this pull request Jul 4, 2024
…-help (#2936)

I'm rebuilding RPM packages to check build reproducibility
(https://docs.fedoraproject.org/en-US/reproducible-builds/), and we got
the result that rebuilds of the snakemake man page varies in this
trivial way:

```
│ │ │ -<dataclasses._MISSING_TYPE object at 0x3ff87071a90>)
│ │ │ +<dataclasses._MISSING_TYPE object at 0xffff819c5a90>)
```

I think that printing those values is not useful at all (apart from
created a problem for reproducible builds efforts), because those values
are not interesting at all for the user. So change the argument parser
definition to suppress all "boring" values like "set()", "None",
"<dataclasses._MISSING_TYPE object at 0x…".

Also, suppress the printing of the default for local cores. It depends
on the local machine configuration, but it also makes the user think
that the value printed is a fixed default.

```
│ │ │  \fB\-\-local\-cores\fR N
│ │ │  In cluster/cloud mode, use at most N cores of the host
│ │ │  machine in parallel (default: number of CPU cores of
│ │ │  the host). The cores are used to execute local rules. 
│ │ │  This option is ignored when not in cluster/cloud mode.
│ │ │ -(default: 3)
│ │ │ +(default: 64)
```

This is the total diff of output with the patch:

```console
$ diff -U0 <(PYTHONPATH=~/python/snakemake-interface-common snakemake --help) \
           <(PYTHONPATH=~/python/snakemake-interface-common:~/python/snakemake snakemake --help)

--- /proc/self/fd/11	2024-06-30 17:15:56.388584254 +0200 +++ /proc/self/fd/12	2024-06-30 17:15:56.393584263 +0200 @@ -93,2 +93 @@
-  targets               Targets to build. May be rules or files. (default:
-                        set())
+  targets               Targets to build. May be rules or files.
@@ -114 +113 @@
-                        SNAKEMAKE_PROFILE] (default: None)
+                        SNAKEMAKE_PROFILE]
@@ -137 +136 @@
-                        myrule=4'. (default: None)
+                        myrule=4'.
@@ -146 +145 @@
-                        containers) needed to create them. (default: None)
+                        containers) needed to create them.
@@ -155 +153,0 @@
-                        (default: None)
@@ -164 +162 @@
-                        workflow.cores. (default: None)
+                        workflow.cores.
@@ -169 +167 @@
-                        jobs. (default: None)
+                        jobs.
@@ -174 +172 @@
-                        (default: 8)
+                        (default: <available CPU count>)
@@ -203 +201 @@
-                        with --set-threads. (default: None)
+                        with --set-threads.
@@ -255 +253 @@
-                        outlined above. (default: None)
+                        outlined above.
@@ -263 +261 @@
-                        be preemptible. (default: None)
+                        be preemptible.
@@ -268 +265,0 @@
-                        (default: None)
@@ -284 +281 @@
-                        Documentation). (default: None)
+                        Documentation).
@@ -286,2 +283 @@
-                        Environment variables to pass to cloud jobs. (default:
-                        set())
+                        Environment variables to pass to cloud jobs.
@@ -290,2 +286 @@
-                        snakefile will use this as their origin). (default:
-                        None)
+                        snakefile will use this as their origin).
@@ -321 +316 @@
-                        plugin: snakemake_executor_<name> (default: None)
+                        plugin: snakemake_executor_<name>
@@ -329 +323,0 @@
-                        (default: set())
@@ -333 +326,0 @@
-                        (default: set())
@@ -345 +338 @@
-                        samples. (default: None)
+                        samples.
@@ -350 +342,0 @@
-                        (default: set())
@@ -356 +348 @@
-                        files specified here. (default: set())
+                        files specified here.
@@ -363 +355 @@
-                        directory. (default: None)
+                        directory.
@@ -374 +366 @@
-                        WMS_MONITOR_TOKEN in the environment. (default: None)
+                        WMS_MONITOR_TOKEN in the environment.
@@ -381,2 +373 @@
-                        endpoint to first interact with the workflow (default:
-                        None)
+                        endpoint to first interact with the workflow
@@ -388 +379 @@
-                        (internal use only). (default: None)
+                        (internal use only).
@@ -391 +382 @@
-                        activate) (internal use only). (default: None)
+                        activate) (internal use only).
@@ -398 +389 @@
-                        remote compute node. (default: None)
+                        remote compute node.
@@ -420 +411 @@
-                        report.html is the default. (default: None)
+                        report.html is the default.
@@ -424 +415 @@
-                        custom logo, see docs. (default: None)
+                        custom logo, see docs.
@@ -428 +419 @@
-                        starting with --report-. (default: None)
+                        starting with --report-.
@@ -437 +428 @@
-                        Snakemake for subsequent jobs. (default: None)
+                        Snakemake for subsequent jobs.
@@ -447,2 +438 @@
-                        present, this will create an empty draft. (default:
-                        None)
+                        present, this will create an empty draft.
@@ -459 +449 @@
-                        is used. (default: None)
+                        is used.
@@ -468 +458 @@
-                        'pytest TESTPATH'. (default: None)
+                        'pytest TESTPATH'.
@@ -473 +462,0 @@
-                        (default: None)
@@ -543 +532 @@
-                        .tar.bz2 and .tar.xz. (default: None)
+                        .tar.bz2 and .tar.xz.
@@ -547 +536 @@
-                        marks that files are incomplete. (default: None)
+                        marks that files are incomplete.
@@ -558,2 +547 @@
-                        input, params) have changed since creation. (default:
-                        None)
+                        input, params) have changed since creation.
@@ -593 +581 @@
-                        use only. (default: None)
+                        use only.
@@ -607 +595 @@
-                        being printed at all. (default: None)
+                        being printed at all.
@@ -643 +631 @@
-                        cluster environments. (default: None)
+                        cluster environments.
@@ -648,2 +636 @@
-                        too long to be passed on the commandline. (default:
-                        None)
+                        too long to be passed on the commandline.
@@ -664 +650,0 @@
-                        (default: frozenset())
@@ -675 +661 @@
-                        results otherwise. (default: None)
+                        results otherwise.
@@ -679,2 +665 @@
-                        for internal use by Snakemake itself only. (default:
-                        set())
+                        for internal use by Snakemake itself only.
@@ -713 +698 @@
-                        used. (default: None)
+                        used.
@@ -716 +701 @@
-                        bucket name. (default: )
+                        bucket name.
@@ -735 +719,0 @@
-                        (default: None)
@@ -761 +744,0 @@
-                        (default: None)
@@ -768 +751 @@
-                        requires yappi to be installed. (default: None)
+                        requires yappi to be installed.
@@ -779 +762 @@
-                        e.g. slack messages or emails. (default: None)
+                        e.g. slack messages or emails.
@@ -784 +767 @@
-                        management system (wms) are supported. (default: None)
+                        management system (wms) are supported.
@@ -817 +800 @@
-                        in the installation directory. (default: None)
+                        in the installation directory.
@@ -834 +816,0 @@
-                        (default: set())
@@ -863,2 +845 @@
-                        software-deployment (see --shared-fs-usage). (default:
-                        None)
+                        software-deployment (see --shared-fs-usage).
@@ -872 +853 @@
-                        tarballs. (default: None)
+                        tarballs.
@@ -902 +883 @@
-                        usage). (default: None)
+                        usage).
@@ -905 +885,0 @@
-                        (default: )
@@ -919,2 +899 @@
-                        directories with htmlindex as results. (default:
-                        <dataclasses._MISSING_TYPE object at 0x7fc51653e030>)
+                        directories with htmlindex as results.
@@ -922,2 +901 @@
-                        Path to a custom stylesheet for the report. (default:
-                        <dataclasses._MISSING_TYPE object at 0x7fc51653e030>)
+                        Path to a custom stylesheet for the report.
```

snakemake/snakemake-interface-common#49 is a
related change. Together, they should make the --help output
reproducible.

<!--Add a description of your PR here-->

### QC
<!-- Make sure that you can tick the boxes below. -->

* [ ] The PR contains a test case for the changes or the changes are
already covered by an existing test case.
* [ ] The documentation (`docs/`) is updated to reflect the changes or
this is not necessary (e.g. if the change does neither modify the
language nor the behavior or functionalities of Snakemake).

---------

Co-authored-by: Johannes Köster <johannes.koester@tu-dortmund.de>
Co-authored-by: Johannes Koester <johannes.koester@uni-due.de>
@johanneskoester
Copy link
Contributor

Thanks for pointing us to this problem! I have now fixed this in a slightly different way. Instead of changing the type, we format the help text properly and deterministically: snakemake/snakemake#3081.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants