Skip to content

Conversation

@muenchnerkindl
Copy link
Collaborator

No description provided.

Signed-off-by: Stephan Merz <stephan.merz@loria.fr>
Signed-off-by: Stephan Merz <stephan.merz@loria.fr>
@muenchnerkindl
Copy link
Collaborator Author

So we again had trouble with using the GitHub runners for checking some of the proofs. For the moment, I excluded the EWD998, Quicksort, and LamportMutex proofs, which all take around a minute or slightly more on my laptop. Does any of you have a better idea?

@lemmy
Copy link
Member

lemmy commented Dec 8, 2025

Why are proof times of a few minutes a problem? As far as I recall, the vendor-enforced hard timeout for a GitHub runner is six hours, which leaves plenty of time for proofs to complete. We also aren’t running these workflows multiple times per day.

If needed, we could skip the proving step entirely when neither the proofs nor the modules they depend on have changed. In practice, this could be implemented as a dedicated CI job for TLAPM using GitHub’s paths/paths-ignore filters, or by running git diff during the proof step to detect whether relevant files were modified.

@ahelwer
Copy link
Collaborator

ahelwer commented Dec 8, 2025

So the new way of controlling whether proofs are checked or not in the CI (based on timing) after PR #187 earlier this year is as follows: the rough average proof execution time is recorded in the manifest in each directory in HH:MM:SS form, for example

{
"path": "specifications/LoopInvariance/Quicksort.tla",
"features": [
"pluscal"
],
"models": [],
"proof": {
"runtime": "00:00:45"
}
},

Then in the check_proofs.py script there is an optional command line parameter --runtime_seconds_limit with a default value of 60:

parser.add_argument('--runtime_seconds_limit', help='Only run proofs with expected runtime less than this value', required=False, default=60)

If the proof runtime recorded in the manifest is less than or equal to this value, the proof will be checked. If checking the proof takes more than double the --runtime_seconds_limit time value, an error is reported in the CI.

So if we want to modify whether a given proof is run and whether it fails we have a few possible approaches:

  • In the manifest, change its runtime to be greater or lesser than 60 seconds
  • In the check_proofs.py script, change the default value of --runtime_seconds_limit to some value greater or lesser than 60 seconds
  • In the CI file, provide a value larger than 60 seconds for the --runtime_seconds_limit CLI parameter for the check_proofs.py script
  • Change the hard timeout in check_proofs.py to triple the --runtime_seconds_limit value, or perhaps expose another --hard_timeout_seconds_limit or --hard_timeout_multiplier CLI parameter

@ahelwer
Copy link
Collaborator

ahelwer commented Dec 8, 2025

Regarding:

If needed, we could skip the proving step entirely when neither the proofs nor the modules they depend on have changed. In practice, this could be implemented as a dedicated CI job for TLAPM using GitHub’s paths/paths-ignore filters, or by running git diff during the proof step to detect whether relevant files were modified.

The proofs were originally checked with TLAPM fingerprint files cached by the CI runner for this reason, however: #68 (comment)

Signed-off-by: Stephan Merz <stephan.merz@loria.fr>
@muenchnerkindl
Copy link
Collaborator Author

Thanks @ahelwer! I tried indicating a longer timeout, but now I get an error from Unicode conversion. Did you change anything there recently that I would have to retrofit to this branch or is this a library mismatch?

@ahelwer
Copy link
Collaborator

ahelwer commented Dec 12, 2025

Seems like a transient github issue; there was a 503 error when downloading one of the releases. Probably the CI should fail if that happens.

run: |
# skip proofs that take too long on certain GitHub runners
SKIP=(
"specifications/ewd998/EWD998_proof.tla"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you be able to modify all of these to request longer check times?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to do that (with expected proof checking time arbitrarily set to 5 minutes), and now the CI indicates no problems. However:

  1. An initial attempt again produced 504 errors for certain versions, indicated problems in Unicode conversion in another one (both of which appear to be spurious errors), while correctly pointing out a syntax error in CI.yml for one version. Even if the errors are spurious, they do not appear to be uncommon.
  2. Looking at the details of the successful runs, I do not see evidence that the proofs for the three problematic examples (LoopInvariance/Quicksort.tla, ewd998/EWD998_proof.tla, and lamport_mutex/LamportMutex_proofs.tla) were indeed checked. Perhaps I again missed something on how proof checking is set up in the CI?

Also, I do not really understand why proof checking requires a global timeout per example: every backend has a timeout anyway (which may be changed in individual steps if necessary), and we already accommodate for slow GitHub runners by indicating --stretch 5. Each of the three proofs mentioned above takes about 1 minute on my laptop, and it is just guesswork how long it may take on a GitHub runner. Even if I ran the scripts locally, that doesn't tell me what timeout I should choose.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can certainly remove all timeouts on the proofs and just run it for however long it runs. I know there are some TLAPS proofs out there (not yet in the examples repo) which take many hours to check, but that can be addressed if/when those proofs are added here. What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a prelude to removing all time constraints you can add --runtime_seconds_limit 1000 to the arguments to the check_proofs.py script in the CI.yml file. Then after this PR is in I'll do a PR to remove the time limits (and possibly go from python to a bash one-liner).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to do that, but both --runtime_seconds_limit 1000 and --runtime_seconds_limit "1000" caused a syntax error:

  File "/home/runner/work/Examples/Examples/.github/scripts/check_proofs.py", line 37, in <module>
    and (runtime := tla_utils.parse_timespan(module['proof']['runtime'])) <= timedelta(seconds = args.runtime_seconds_limit)
                                                                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: unsupported type for timedelta seconds component: str

I have no strong objecting to setting a timeout per example, it's just guesswork to figure out what value that should be.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ach sorry, I can make the change myself then you can rebase; will do so later today.

Copy link
Collaborator

@ahelwer ahelwer Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes more sense to merge these changes then I will do a fix in-place. Thanks for fixing the proofs!

Signed-off-by: Stephan Merz <stephan.merz@loria.fr>
Signed-off-by: Stephan Merz <stephan.merz@loria.fr>
@muenchnerkindl
Copy link
Collaborator Author

If needed, we could skip the proving step entirely when neither the proofs nor the modules they depend on have changed. In practice, this could be implemented as a dedicated CI job for TLAPM using GitHub’s paths/paths-ignore filters, or by running git diff during the proof step to detect whether relevant files were modified.

But this would also have to include external dependencies, such as changes to TLAPS, including its standard library, which risks introducing even more brittleness.

Signed-off-by: Stephan Merz <stephan.merz@loria.fr>
Signed-off-by: Stephan Merz <stephan.merz@loria.fr>
Signed-off-by: Stephan Merz <stephan.merz@loria.fr>
@ahelwer ahelwer merged commit dca6876 into master Dec 15, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants