-
Notifications
You must be signed in to change notification settings - Fork 212
adapting proofs to modified FS_Induction theorem #190
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Stephan Merz <stephan.merz@loria.fr>
Signed-off-by: Stephan Merz <stephan.merz@loria.fr>
|
So we again had trouble with using the GitHub runners for checking some of the proofs. For the moment, I excluded the EWD998, Quicksort, and LamportMutex proofs, which all take around a minute or slightly more on my laptop. Does any of you have a better idea? |
|
Why are proof times of a few minutes a problem? As far as I recall, the vendor-enforced hard timeout for a GitHub runner is six hours, which leaves plenty of time for proofs to complete. We also aren’t running these workflows multiple times per day. If needed, we could skip the proving step entirely when neither the proofs nor the modules they depend on have changed. In practice, this could be implemented as a dedicated CI job for TLAPM using GitHub’s |
|
So the new way of controlling whether proofs are checked or not in the CI (based on timing) after PR #187 earlier this year is as follows: the rough average proof execution time is recorded in the manifest in each directory in Examples/specifications/LoopInvariance/manifest.json Lines 52 to 61 in 0e018bc
Then in the Examples/.github/scripts/check_proofs.py Line 16 in 0e018bc
If the proof runtime recorded in the manifest is less than or equal to this value, the proof will be checked. If checking the proof takes more than double the So if we want to modify whether a given proof is run and whether it fails we have a few possible approaches:
|
|
Regarding:
The proofs were originally checked with TLAPM fingerprint files cached by the CI runner for this reason, however: #68 (comment) |
Signed-off-by: Stephan Merz <stephan.merz@loria.fr>
|
Thanks @ahelwer! I tried indicating a longer timeout, but now I get an error from Unicode conversion. Did you change anything there recently that I would have to retrofit to this branch or is this a library mismatch? |
|
Seems like a transient github issue; there was a 503 error when downloading one of the releases. Probably the CI should fail if that happens. |
.github/workflows/CI.yml
Outdated
| run: | | ||
| # skip proofs that take too long on certain GitHub runners | ||
| SKIP=( | ||
| "specifications/ewd998/EWD998_proof.tla" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you be able to modify all of these to request longer check times?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to do that (with expected proof checking time arbitrarily set to 5 minutes), and now the CI indicates no problems. However:
- An initial attempt again produced 504 errors for certain versions, indicated problems in Unicode conversion in another one (both of which appear to be spurious errors), while correctly pointing out a syntax error in CI.yml for one version. Even if the errors are spurious, they do not appear to be uncommon.
- Looking at the details of the successful runs, I do not see evidence that the proofs for the three problematic examples (LoopInvariance/Quicksort.tla, ewd998/EWD998_proof.tla, and lamport_mutex/LamportMutex_proofs.tla) were indeed checked. Perhaps I again missed something on how proof checking is set up in the CI?
Also, I do not really understand why proof checking requires a global timeout per example: every backend has a timeout anyway (which may be changed in individual steps if necessary), and we already accommodate for slow GitHub runners by indicating --stretch 5. Each of the three proofs mentioned above takes about 1 minute on my laptop, and it is just guesswork how long it may take on a GitHub runner. Even if I ran the scripts locally, that doesn't tell me what timeout I should choose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can certainly remove all timeouts on the proofs and just run it for however long it runs. I know there are some TLAPS proofs out there (not yet in the examples repo) which take many hours to check, but that can be addressed if/when those proofs are added here. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a prelude to removing all time constraints you can add --runtime_seconds_limit 1000 to the arguments to the check_proofs.py script in the CI.yml file. Then after this PR is in I'll do a PR to remove the time limits (and possibly go from python to a bash one-liner).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to do that, but both --runtime_seconds_limit 1000 and --runtime_seconds_limit "1000" caused a syntax error:
File "/home/runner/work/Examples/Examples/.github/scripts/check_proofs.py", line 37, in <module>
and (runtime := tla_utils.parse_timespan(module['proof']['runtime'])) <= timedelta(seconds = args.runtime_seconds_limit)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: unsupported type for timedelta seconds component: str
I have no strong objecting to setting a timeout per example, it's just guesswork to figure out what value that should be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ach sorry, I can make the change myself then you can rebase; will do so later today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes more sense to merge these changes then I will do a fix in-place. Thanks for fixing the proofs!
Signed-off-by: Stephan Merz <stephan.merz@loria.fr>
Signed-off-by: Stephan Merz <stephan.merz@loria.fr>
But this would also have to include external dependencies, such as changes to TLAPS, including its standard library, which risks introducing even more brittleness. |
Signed-off-by: Stephan Merz <stephan.merz@loria.fr>
Signed-off-by: Stephan Merz <stephan.merz@loria.fr>
No description provided.