adapting proofs to modified FS_Induction theorem #190

muenchnerkindl · 2025-12-08T10:46:32Z

No description provided.

Signed-off-by: Stephan Merz <stephan.merz@loria.fr>

muenchnerkindl · 2025-12-08T13:05:59Z

So we again had trouble with using the GitHub runners for checking some of the proofs. For the moment, I excluded the EWD998, Quicksort, and LamportMutex proofs, which all take around a minute or slightly more on my laptop. Does any of you have a better idea?

lemmy · 2025-12-08T14:59:16Z

Why are proof times of a few minutes a problem? As far as I recall, the vendor-enforced hard timeout for a GitHub runner is six hours, which leaves plenty of time for proofs to complete. We also aren’t running these workflows multiple times per day.

If needed, we could skip the proving step entirely when neither the proofs nor the modules they depend on have changed. In practice, this could be implemented as a dedicated CI job for TLAPM using GitHub’s paths/paths-ignore filters, or by running git diff during the proof step to detect whether relevant files were modified.

ahelwer · 2025-12-08T19:15:44Z

So the new way of controlling whether proofs are checked or not in the CI (based on timing) after PR #187 earlier this year is as follows: the rough average proof execution time is recorded in the manifest in each directory in HH:MM:SS form, for example

Examples/specifications/LoopInvariance/manifest.json

Lines 52 to 61 in 0e018bc

    
           { 
        
             "path": "specifications/LoopInvariance/Quicksort.tla", 
        
             "features": [ 
        
               "pluscal" 
        
             ], 
        
             "models": [], 
        
             "proof": { 
        
               "runtime": "00:00:45" 
        
             } 
        
           },

Then in the check_proofs.py script there is an optional command line parameter --runtime_seconds_limit with a default value of 60:

Examples/.github/scripts/check_proofs.py

Line 16 in 0e018bc

    
           parser.add_argument('--runtime_seconds_limit', help='Only run proofs with expected runtime less than this value', required=False, default=60)

If the proof runtime recorded in the manifest is less than or equal to this value, the proof will be checked. If checking the proof takes more than double the --runtime_seconds_limit time value, an error is reported in the CI.

So if we want to modify whether a given proof is run and whether it fails we have a few possible approaches:

In the manifest, change its runtime to be greater or lesser than 60 seconds
In the check_proofs.py script, change the default value of --runtime_seconds_limit to some value greater or lesser than 60 seconds
In the CI file, provide a value larger than 60 seconds for the --runtime_seconds_limit CLI parameter for the check_proofs.py script
Change the hard timeout in check_proofs.py to triple the --runtime_seconds_limit value, or perhaps expose another --hard_timeout_seconds_limit or --hard_timeout_multiplier CLI parameter

ahelwer · 2025-12-08T19:22:42Z

Regarding:

If needed, we could skip the proving step entirely when neither the proofs nor the modules they depend on have changed. In practice, this could be implemented as a dedicated CI job for TLAPM using GitHub’s paths/paths-ignore filters, or by running git diff during the proof step to detect whether relevant files were modified.

The proofs were originally checked with TLAPM fingerprint files cached by the CI runner for this reason, however: #68 (comment)

Signed-off-by: Stephan Merz <stephan.merz@loria.fr>

muenchnerkindl · 2025-12-11T16:27:26Z

Thanks @ahelwer! I tried indicating a longer timeout, but now I get an error from Unicode conversion. Did you change anything there recently that I would have to retrofit to this branch or is this a library mismatch?

ahelwer · 2025-12-12T06:03:18Z

Seems like a transient github issue; there was a 503 error when downloading one of the releases. Probably the CI should fail if that happens.

ahelwer · 2025-12-12T18:40:19Z

.github/workflows/CI.yml

        run: |
+          # skip proofs that take too long on certain GitHub runners
+          SKIP=(
+            "specifications/ewd998/EWD998_proof.tla"


Would you be able to modify all of these to request longer check times?

I tried to do that (with expected proof checking time arbitrarily set to 5 minutes), and now the CI indicates no problems. However:

An initial attempt again produced 504 errors for certain versions, indicated problems in Unicode conversion in another one (both of which appear to be spurious errors), while correctly pointing out a syntax error in CI.yml for one version. Even if the errors are spurious, they do not appear to be uncommon.

Looking at the details of the successful runs, I do not see evidence that the proofs for the three problematic examples (LoopInvariance/Quicksort.tla, ewd998/EWD998_proof.tla, and lamport_mutex/LamportMutex_proofs.tla) were indeed checked. Perhaps I again missed something on how proof checking is set up in the CI?

Also, I do not really understand why proof checking requires a global timeout per example: every backend has a timeout anyway (which may be changed in individual steps if necessary), and we already accommodate for slow GitHub runners by indicating --stretch 5. Each of the three proofs mentioned above takes about 1 minute on my laptop, and it is just guesswork how long it may take on a GitHub runner. Even if I ran the scripts locally, that doesn't tell me what timeout I should choose.

We can certainly remove all timeouts on the proofs and just run it for however long it runs. I know there are some TLAPS proofs out there (not yet in the examples repo) which take many hours to check, but that can be addressed if/when those proofs are added here. What do you think?

As a prelude to removing all time constraints you can add --runtime_seconds_limit 1000 to the arguments to the check_proofs.py script in the CI.yml file. Then after this PR is in I'll do a PR to remove the time limits (and possibly go from python to a bash one-liner).

I tried to do that, but both --runtime_seconds_limit 1000 and --runtime_seconds_limit "1000" caused a syntax error:

File "/home/runner/work/Examples/Examples/.github/scripts/check_proofs.py", line 37, in <module> and (runtime := tla_utils.parse_timespan(module['proof']['runtime'])) <= timedelta(seconds = args.runtime_seconds_limit) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ TypeError: unsupported type for timedelta seconds component: str

I have no strong objecting to setting a timeout per example, it's just guesswork to figure out what value that should be.

Ach sorry, I can make the change myself then you can rebase; will do so later today.

Makes more sense to merge these changes then I will do a fix in-place. Thanks for fixing the proofs!

Signed-off-by: Stephan Merz <stephan.merz@loria.fr>

muenchnerkindl · 2025-12-13T15:24:04Z

If needed, we could skip the proving step entirely when neither the proofs nor the modules they depend on have changed. In practice, this could be implemented as a dedicated CI job for TLAPM using GitHub’s paths/paths-ignore filters, or by running git diff during the proof step to detect whether relevant files were modified.

But this would also have to include external dependencies, such as changes to TLAPS, including its standard library, which risks introducing even more brittleness.

Signed-off-by: Stephan Merz <stephan.merz@loria.fr>

muenchnerkindl added 2 commits December 8, 2025 11:44

adapting proofs to modified FS_Induction theorem

40d883d

Signed-off-by: Stephan Merz <stephan.merz@loria.fr>

skip long-running proofs in CI

57bc626

Signed-off-by: Stephan Merz <stephan.merz@loria.fr>

request longer timeout for CI proof checking

018369b

Signed-off-by: Stephan Merz <stephan.merz@loria.fr>

ahelwer reviewed Dec 12, 2025

View reviewed changes

muenchnerkindl added 2 commits December 13, 2025 08:18

longer timeouts for proofs

26a6887

Signed-off-by: Stephan Merz <stephan.merz@loria.fr>

fix typo in CI.yml

d6aa38a

Signed-off-by: Stephan Merz <stephan.merz@loria.fr>

muenchnerkindl mentioned this pull request Dec 13, 2025

Protect master branch #188

Open

muenchnerkindl added 3 commits December 14, 2025 09:15

set 1000 seconds limit for proof checking per example

53e79d7

Signed-off-by: Stephan Merz <stephan.merz@loria.fr>

fix typo

20193ba

Signed-off-by: Stephan Merz <stephan.merz@loria.fr>

reverting time limit

6f27db0

Signed-off-by: Stephan Merz <stephan.merz@loria.fr>

ahelwer merged commit dca6876 into master Dec 15, 2025
8 checks passed

ahelwer mentioned this pull request Dec 15, 2025

CI: check all proofs without time limit #191

Merged

adapting proofs to modified FS_Induction theorem #190

adapting proofs to modified FS_Induction theorem #190

Uh oh!

Conversation

muenchnerkindl commented Dec 8, 2025

Uh oh!

muenchnerkindl commented Dec 8, 2025

Uh oh!

lemmy commented Dec 8, 2025

Uh oh!

ahelwer commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ahelwer commented Dec 8, 2025

Uh oh!

muenchnerkindl commented Dec 11, 2025

Uh oh!

ahelwer commented Dec 12, 2025

Uh oh!

ahelwer Dec 12, 2025

Choose a reason for hiding this comment

Uh oh!

muenchnerkindl Dec 13, 2025

Choose a reason for hiding this comment

Uh oh!

ahelwer Dec 13, 2025

Choose a reason for hiding this comment

Uh oh!

ahelwer Dec 14, 2025

Choose a reason for hiding this comment

Uh oh!

muenchnerkindl Dec 14, 2025

Choose a reason for hiding this comment

Uh oh!

ahelwer Dec 14, 2025

Choose a reason for hiding this comment

Uh oh!

ahelwer Dec 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

muenchnerkindl commented Dec 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

4 participants

ahelwer commented Dec 8, 2025 •

edited

Loading

ahelwer Dec 15, 2025 •

edited

Loading