HolSmt: add support for Z3 v4.12.4 proof reconstruction #1194

someplaceguy · 2024-02-07T14:02:58Z

This PR adds support for reconstructing Z3 v4.12.4 proof certificates in HOL4.

It contains 16 commits -- a (more) detailed description is available in each of them.

So far it has only been tested with the HolSmt self-tests. All the self-tests that were passing with Z3 v2.19 proof reconstruction now also pass with Z3 v4.12.4 proof reconstruction.

As a consequence, I'm deprecating HolSmt support for Z3 v2.19 (see motivation in commit b539306). The HolSmt documentation was changed appropriately. However, for the moment this PR still retains support for parsing and reconstructing Z3 v2.19 proofs, since these are mostly just a subset of Z3 v4.12.4 proofs.

A few other minor, unrelated improvements to HolSmt were also made.

Note that this is only an initial implementation: it is almost guaranteed that HolSmt is still missing support for some Z3 v4.12.4 proof rules, but it is difficult to implement them until I run across examples where these rules appear in the proof certificates. I expect this gap will decrease over time, as I intent to keep fixing more HolSmt self-tests to work with proof reconstruction and add support for more features (to the extent that I can). It's also worth noting that the existing HolSmt support for Z3 v2.19 proof reconstruction is similarly incomplete, in part due to bugs in Z3 v2.19 which cannot be fixed, since this version is not being actively developed anymore and the source code is not available.

I've also attempted to update the Docker CI files to use Z3 v4.12.4 by default, although I'm not sure these changes are correct (help would be appreciated, esp. @binghe as he added this feature 😄). Related question: @binghe you mentioned in #1179 that for arm64 only Z3 4.12.4 is available: is this enforced anywhere?

The above change to the Docker CI files should be the only change in this PR that affects anything outside HolSmt.

As a side effect of this PR, it should now be possible to reconstruct Z3 proofs in architectures other than x86 and x86-64 (without using an emulator) 🎉

cc @tjark

binghe · 2024-02-07T23:01:32Z

I've also attempted to update the Docker CI files to use Z3 v4.12.4 by default, although I'm not sure these changes are correct (help would be appreciated, esp. @binghe as he added this feature 😄). Related question: @binghe you mentioned in #1179 that for arm64 only Z3 4.12.4 is available: is this enforced anywhere?

Great work. (I ever spent some time searching inside Z3 4.x documentation to try to figure out how it does proof logging but didn't succeed. Now I think I used wrong keywords.)

For your question, it's not enforced anywhere in HOL4's source code. Instead, I simply meant that, only Z3 4.x provided release downloads for arm64 Linux (and it's already installed in the docker CI image). The other two supported SMT solvers (CVC5 and Yices 1.x) seem to have no publicly available source code, nor they have available arm64 Linux binaries.

mn200 · 2024-02-08T03:07:38Z

I'll give @tjark a chance to weigh in before merging, but many thanks for the ongoing work on this!

binghe · 2024-02-08T03:15:52Z

As a side effect of this PR, it should now be possible to reconstruct Z3 proofs in architectures other than x86 and x86-64 (without using an emulator) 🎉

Totally agree on this. Now on macOS (both x86_64 and arm64), HOL4 users can freely use HolSmt (with Z3 4.x as the backend) in their proofs, and there's no worry about the potential mistakes caused by untrusted oracles.

These test cases were verified to work properly for the SMT-LIB translation but I had forgotten to enable cvc5 for them.

Otherwise, when certain tests fail, it's impossible to distinguish them from others that have the exact same formula but have different types, e.g. nums vs ints.

Support for old-style Z3 v2.19 proofs is retained, since the differences are minor. That said, in the future it might make sense to remove support for Z3 v2.19 proofs due to such an old version being obsolete, although it'd be desirable for that to happen only after Z3 v4 proof replay reaches feature parity.

These are used by Z3 v4.12.x (unlike Z3 v2.19) to declare variables of a specified type. Apparently, Z3 expects us to unify these variables with arbitrary terms as part of `rewrite` proof rules. Z3 also assigns formulas / terms to these variables as part of `intro-def` rules.

This rule basically introduces a name for a term, i.e. a definition of the form: ``var = term`` However, Z3 often defines such a variable by providing a term in a more complicated form such as (for example): ``(var \/ ~term) /\ (~var \/ term)`` (All such schematic forms provided by Z3 are tautologies given that `var = term`). We track all such definitions, which accumulate in the set of hypotheses of all theorems that depend on them, and at the end they are all removed from the final theorem. We can remove them from the final theorem because these variables do not appear in the remaining hypotheses nor in the conclusion of the theorem, i.e. all such variable names are invented by Z3 for the purposes of facilitating intermediate proofs and are thus irrelevant with regards to the final theorem.

@x91

While reconstructing a proof of `(if x < y then x else y) <= x`, Z3 v4.12.4 asks us to prove the following rewrite rule as one of the proof steps: (let ((@x91 (rewrite (= (not (>= (+ x (* (- 1) (ite $x36 y x))) 0)) $x86)))) ... corresponding to this term: ~(x + -1 * (if x + -1 * y >= 0 then y else x) >= 0) <=> ~(x + -1 * $var$(z3name!0) >= 0) ... where z3name!0 is a variable declared by Z3 at the beginning of its proof certificate, but which we know nothing about at this point. To make the proof go through, we unify the two sides of the equivalence, such that we obtain: z3name!0 = (if x + -1 * y >= 0 then y else x) We then prove the theorem by substituting the variable(s) in the goal and add the above definition to the same list of Z3-provided definitions as that used by the `z3_intro_def` handler, so that it gets removed at the end of the proof.

Z3 describes it as being the same as `bvsdiv`, "but created in a context where the second operand can be assumed to be non-zero". Currently we process it in the same way as `bvsdiv`.

Some terms, particularly in proofs containing word division, are huge in size (for word32 division, dozens of MB when printed out, due to combinatory explosion), so profiling is useful to figure out where the proof replay is stuck or taking too much time.

They uncovered a bug during development of support for Z3 v4.12.x proof replay, so it's probably useful to add them to the test suite to avoid future regressions.

This proof rule is supposed to handle multiple symmetry and transitivity rules. Z3 provides the following example: A1 |- R a b A2 |- R c b A3 |- R c d --------------------------------------- trans* A1 u A2 u A3 |- R a d Although more generally, the proof rule is supposed to handle any number of theorems passed as arguments and any path between the elements. R must be a symmetric and transitive relation. This rule came up in the Z3 v4.12.4 proof reconstruction of the following self-test: f (f x) = x /\ f (f (f (f (f x)))) = x ==> f x = x The only relation that came up in this proof was for equality. It is unknown if currently Z3 also uses this proof rule for other relations. However, it is worth noting that the related `symm` and `trans` proof rules, as currently implemented in HolSmt, can only handle the equality relation, even though Z3 v4.12.4 also describes them as proof rules for arbitrary relations. For the initial implementation, we rely on metisLib.METIS_TAC to find a proof. This is more than fast enough for the small proof of the above self-test. However, if METIS_TAC becomes a bottleneck, a more specialized proof handler could be implemented to improve performance.

This commit adds new rewrite theorems for: ~(p ==> q) <=> ~(~p \/ q) (~p <=> q) <=> (p <=/=> q) In this commit they were inserted as theorems r032 and r010 (resp.), others being renumbered as appropriate. They come up in the Z3 v4.12.4 proof reconstruction for the following self-tests, respectively: (f (f x) = x) /\ (f (f (f (f (f x)))) = x) ==> (f x = x) ((x, y) = (y, x)) = (x = y)

This file was introduced in the following commit: 18c8b76 However, in that commit I forgot to add a description of the file to the README (which contains a description of the other HolSmt files).

This commit fixes the parsing of the `bvand`, `bvor`, `bvadd`, `bvmul`, `bvxor` and `bvxnor` operators in Z3 v4 proof reconstruction. `bvxor` and `bvxnor` were already implemented but only as binary operators. However, Z3 v4.12.4 declares `bvxor` and `bvxnor` as being "flat associative" and in (at least) one of the Z3 proofs in the self-tests `bvxor` does appear with 3 arguments. Therefore, the parser was changed to parse these operators using a left-associative application of the arguments. The other operators were missing, i.e. they were not being parsed. In z3 v4.12.4 they are similarly declared as being flat associative. Strangely, this version of Z3 does not declare `bvnand` and `bvnor` as being flat associative, so we keep parsing them as binary operators.

HolSmt now supports proof reconstruction for Z3 v4.12.4, so we can now use it by default to run the HolSmt self-tests.

Now that proof reconstruction for Z3 v4.12.4 works, we can deprecate Z3 v2.19. Using a modern version of Z3 has many advantages, since Z3 is now open-source and also, binaries exist (or can be compiled) for platforms other than x86 and x86-64. Also, Z3 v4.12.x is in active development, which means there's a greater chance that bugs that are encountered will be fixed (unlike Z3 v2.19 which is a very old, deprecated version) and other improvements made. Modern versions of Z3 are also easier to install in current Linux distributions, since usually they are already part of their package collection. Furthermore, if we can ignore Z3 v2.19 then we can start enabling proof reconstruction for those self-tests which can only be reconstructed in Z3 v4 (once the reconstruction is fixed, that is).

tjark · 2024-02-08T11:47:52Z

I'll give @tjark a chance to weigh in before merging, but many thanks for the ongoing work on this!

Please go ahead! I haven't reviewed the code, but from the description of the PR, this sounds amazing.

tjark · 2024-02-08T11:55:34Z

Note that this is only an initial implementation: it is almost guaranteed that HolSmt is still missing support for some Z3 v4.12.4 proof rules, but it is difficult to implement them until I run across examples where these rules appear in the proof certificates.

A good way to get more examples would be to run Z3 on the benchmarks in (relevant logics of) SMT-LIB. (Depending on how far you want to push this, you could basically re-do the evaluation from the "Fast LCF-Style Proof Reconstruction for Z3" paper with Z3 v4.12.4.)

someplaceguy · 2024-02-08T11:57:09Z

Minor update:

Fixed some typos and slightly inaccurate terminology.
Fixed trans* handler to perform a union of all the hypotheses of all input theorems, which then becomes the set of assumptions used for proving. Otherwise, if any of the input theorems had any hypotheses, the theorem we are trying to prove would become unprovable. I had already realized that this was necessary (hence the A1 u A2 u A3 in the example I provided in the comment) but I had forgotten to implement it.
Added one more argument for deprecating Z3 v2.19.

Also, I should say that I'm not too happy with the implementation of remove_definitions. I think it should be very fast in most cases, i.e. if the nesting of the definitions isn't too deep, but otherwise it might cause an exponential term blow-up. The problem is that while proving the theorems necessary to remove those definitions from the set of hypotheses, the tactic I implemented will fully expand all definitions. This can be a problem in cases like this, which seem to be quite tricky to handle:

z1 = x + 1
z2 = y + 1

z3 =   z2    + (y + 1) +   z1
z3 = (y + 1) +    z2   + (x + 1)

z4 =          z3         + (z2 + z2 + z1) +   z1
z4 = (z2 + z2 + (x + 1)) +       z3       + (x + 1)

(...)

z64 =          z63          + (z62 + z62 + z1) +   z1
z64 = (z62 + z62 + (x + 1)) +        z63       + (x + 1)

In particular, this example would fail with the tactic I implemented since it would require more than 2^64 expansions, due to the deep nesting.

I'm thinking of doing a different implementation, where first I would gather all the definitions of z64 and would perform term unification between them, to arrive at a minimal expression that defines z64:

z64 = z63 + z63 + z1

... assuming the following:

z63 = z62 + z62 + (x + 1)
z63 = z62 + z62 + z1
z1 = x + 1

I would then somehow remove all other definitions of z64 (I guess by proving with substitution, using these assumptions).
Once I'd have a single minimal definition for z64 it would be trivial to remove it.

Finally, I would perform the exact same thing for z63, z62, z61 ... z1 (in order). It'd be important to do this in order, using a topological sort, to avoid doing a lot of extra work (since e.g. when dealing with z64 we'd be adding new assumptions for z63 ... z1, so if we process them in that order, they would deduplicate automatically when stored in the set of hypotheses).

I think this new alternative implementation could be a lot slower if Z3 provides us many duplicate definitions that are already fully or almost fully expanded (since we'd be doing a lot of unnecessary term unifications with huge terms), but the advantage is that it'd avoid the exponential blow-up for these worst-case scenarios.

I am also suspecting that these worst-case scenarios might actually show up in practice because I suspect that the entire reason why Z3 is inventing these definitions is indeed to avoid this exponential blow-up in proofs!

someplaceguy · 2024-02-08T11:58:48Z

A good way to get more examples would be to run Z3 on the benchmarks in (relevant logics of) SMT-LIB. (Depending on how far you want to push this, you could basically re-do the evaluation from the "Fast LCF-Style Proof Reconstruction for Z3" paper with Z3 v4.12.4.)

That's a great idea, thanks! I would love to do this after I try fixing as many self-tests as possible to work with proof reconstruction!

someplaceguy · 2024-02-08T19:08:43Z

Also, I should say that I'm not too happy with the implementation of remove_definitions.

I'm thinking of doing a different implementation (...)

So, I've been discussing this issue with @tjark, who gave me great suggestions. @tjark's idea is similar to what I was proposing, but @tjark's implementation suggestions lead to a simpler and much faster solution that what I was proposing, I think -- probably as fast as the current implementation in this PR in the straightforward cases, but completely avoiding the exponential blow-up in the hard ones!

I'm hoping I'll be able to do the new implementation within the next few days.

mn200 · 2024-02-08T22:53:21Z

I'll leave this open for the "new implementation" so we can catch all the work in one PR. Thanks!

@tjark

This new implementation was developed to address concerns about exponential term blow-up. Basically, if we have definitions of the form: z1 = 1 z2 = z1 + z1 z3 = z2 + z2 ... z64 = z63 + z63 ... then the old implementation would have fully expanded the definition of `z64` in order to remove it, which would not fit in memory due to the 2^64 variable substitutions which would be required. This new implementation avoids fully expanding definitions. It was developed with valuable suggestions and support from @tjark.

someplaceguy · 2024-02-15T23:47:46Z

Added two commits:

67a8163 Replaces the old implementation of remove_definitions with the new one, which should properly handle the tricky cases that me and @tjark came up with.
e72e7e5 Adds unit tests for remove_definitions. The two tricky unit tests were verified to fail with the old implementation of remove_definitions (i.e. if the previous commit is reverted).

That said, @tjark seems to have come up with another tricky case which I'm not sure the new implementation can handle. However, I did not understand the exact specifics. @tjark, can you provide more details about the example that you mentioned for which unification might not save us? I would like to add a unit test for that case as well, if possible.

Thank you!

someplaceguy · 2024-02-20T10:39:38Z

Update: @tjark came up with a new clever scenario which 1) I hadn't anticipated and 2) is leading to circular definitions (due to term unification) even though they weren't circular to begin with. I have added a unit test for this new case (in my local repository) and I've confirmed that the new implementation is failing due to this issue. @tjark has also suggested a straightforward fix for this new issue which I'm planning to implement soon. Thanks!

someplaceguy · 2024-02-21T17:29:17Z

I've fixed this PR to handle a kind of definitions which might appear and which I wasn't originally anticipating. These are definitions of the form var1 = var2, where at least one of the variables is defined by Z3.

These definitions were problematic for various reasons, especially the issue that @tjark identified which could cause circular definitions to arise even though they weren't circular to begin with.

To fix all the issues in a principled way, I've added code to keep track of which variables Z3 has defined in its proof log, so that we can refer to them in the proof replay procedure. I wasn't originally planning to do this as it didn't seem necessary, but in hindsight I believe it makes the code that handles these definitions clearer and more robust, especially when handling definitions of this kind. The commit messages and comments in the code contain more details.

I've also added a couple of unit tests to exercise the issue that @tjark identified with circular definitions.

Thanks!

These are run as part of the self-tests. Currently only unit tests for `Z3_ProofReplay.remove_definitions` have been added. Two of them are meant to test tricky cases which would cause the previous implemention of `remove_definitions` to fail.

It turns out that there are multiple reasons for why keeping track of Z3-defined variables is desirable (or perhaps even needed) during proof replay. One of the issues is that when removing definitions, term unification may end up creating new definitions of this form: var1 = var2 ... where `var1` was not in the previous set of variables that we were keeping track of for removal. This would cause such definitions to be ignored when calculating the new set of definitions to remove. Another reason is that we want to avoid ending up with circular definitions such as: var1 = var2 var2 = var1 ... where `var1` and `var2` are both Z3-defined variables. To prevent this, we can orient such definitions so that `var2 = var1` is always translated into `var1 = var2` (where `var1` <= `var2`, for some definition of `<=`), i.e. we can always create them in a canonical orientation. Keeping track of Z3-defined variables also allows us to orient definitions created due to term unification (e.g. as part of `rewrite` proof rules) such that they end up as `var = x` instead of `x = var`, where `x` is a user-defined variable and `var` is a Z3-defined variable. Therefore, this commit adds code to keep track of which variables have been defined by Z3. As a side effect, it fixes the first issue mentioned above. A subsequent commit will use this functionality to orient the definitions appropriately during proof replay, which will fix the remaining two issues.

In Z3 proof replay, sometimes we may end up with definitions of the form: var1 = var2 This kind of definition may theoretically arise in three different places: 1. As an explicit Z3-provided definition in `intro-def` proof rules 2. As an implicit Z3-provided definition which we instantiate as part of `rewrite` proof rules (after term unification) 3. When unifying terms to add new definitions in the process of removing the definitions from the final theorem. This can be problematic if `var1` is not a Z3-defined variable (which means `var2` is, and thus the definition should be reversed) or if both `var1` and `var2` are Z3-defined variables. If both vars are Z3-defined variables, we may end up with the following two hypotheses in the final theorem, which `remove_definitions` can't handle due to the circularity: var1 = var2 var2 = var1 To avoid ending up with such problematic definitions, this commit introduces code to change `var1 = var2` into `var2 = var1`, so that the left-hand side is always a Z3-defined variable and, if the right-hand side is also a Z3-defined variable, the former is not "greater" than the latter (for some definition of "greater").

mn200 · 2024-02-21T23:04:27Z

Sounds like the epic story has come to a fittingly successful conclusion! Thanks for all the work on this one.

someplaceguy force-pushed the u/z3-v4 branch from 30cca21 to 7b3938b Compare February 7, 2024 14:29

someplaceguy force-pushed the u/z3-v4 branch from 7b3938b to d601833 Compare February 8, 2024 11:06

someplaceguy added 16 commits February 8, 2024 11:10

HolSmt: enable cvc5 for a few test cases

211ab33

These test cases were verified to work properly for the SMT-LIB translation but I had forgotten to enable cvc5 for them.

HolSmt: enable thm_AUTO in test cases where it works

642504a

HolSmt: show types when a self-test fails

acab511

Otherwise, when certain tests fail, it's impossible to distinguish them from others that have the exact same formula but have different types, e.g. nums vs ints.

HolSmt: parse bvsdiv_i terms in Z3 proofs

8fda4ff

Z3 describes it as being the same as `bvsdiv`, "but created in a context where the second operand can be assumed to be non-zero". Currently we process it in the same way as `bvsdiv`.

HolSmt: add a few new test cases exercising min/max

be9dd74

They uncovered a bug during development of support for Z3 v4.12.x proof replay, so it's probably useful to add them to the test suite to avoid future regressions.

HolSmt: add a description of CVC.sml to the README

3a298ee

This file was introduced in the following commit: 18c8b76 However, in that commit I forgot to add a description of the file to the README (which contains a description of the other HolSmt files).

Update docker-ci{,-for-pr}.yml to use Z3 v4.12.4 by default

a14bc3f

HolSmt now supports proof reconstruction for Z3 v4.12.4, so we can now use it by default to run the HolSmt self-tests.

someplaceguy force-pushed the u/z3-v4 branch from d601833 to b539306 Compare February 8, 2024 11:10

someplaceguy marked this pull request as draft February 20, 2024 10:38

someplaceguy force-pushed the u/z3-v4 branch from e72e7e5 to 29fd902 Compare February 21, 2024 17:08

someplaceguy marked this pull request as ready for review February 21, 2024 17:12

someplaceguy added 4 commits February 21, 2024 18:24

HolSmt: add unit tests for circular definitions

00ac518

someplaceguy force-pushed the u/z3-v4 branch from 29fd902 to 00ac518 Compare February 21, 2024 18:24

mn200 merged commit 525694a into HOL-Theorem-Prover:develop Feb 21, 2024
2 checks passed

binghe mentioned this pull request Feb 21, 2024

Upgraded GitHub actions (Node.js 16 actions are deprecated) #1201

Merged

someplaceguy mentioned this pull request Mar 7, 2024

Proof log has mismatched predicate in quant-inst inference rule Z3Prover/z3#7154

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HolSmt: add support for Z3 v4.12.4 proof reconstruction #1194

HolSmt: add support for Z3 v4.12.4 proof reconstruction #1194

someplaceguy commented Feb 7, 2024 •

edited

Loading

binghe commented Feb 7, 2024 •

edited

Loading

mn200 commented Feb 8, 2024

binghe commented Feb 8, 2024

tjark commented Feb 8, 2024

tjark commented Feb 8, 2024

someplaceguy commented Feb 8, 2024 •

edited

Loading

someplaceguy commented Feb 8, 2024

someplaceguy commented Feb 8, 2024 •

edited

Loading

mn200 commented Feb 8, 2024

someplaceguy commented Feb 15, 2024 •

edited

Loading

someplaceguy commented Feb 20, 2024

someplaceguy commented Feb 21, 2024

mn200 commented Feb 21, 2024

HolSmt: add support for Z3 v4.12.4 proof reconstruction #1194

HolSmt: add support for Z3 v4.12.4 proof reconstruction #1194

Conversation

someplaceguy commented Feb 7, 2024 • edited Loading

binghe commented Feb 7, 2024 • edited Loading

mn200 commented Feb 8, 2024

binghe commented Feb 8, 2024

tjark commented Feb 8, 2024

tjark commented Feb 8, 2024

someplaceguy commented Feb 8, 2024 • edited Loading

someplaceguy commented Feb 8, 2024

someplaceguy commented Feb 8, 2024 • edited Loading

mn200 commented Feb 8, 2024

someplaceguy commented Feb 15, 2024 • edited Loading

someplaceguy commented Feb 20, 2024

someplaceguy commented Feb 21, 2024

mn200 commented Feb 21, 2024

someplaceguy commented Feb 7, 2024 •

edited

Loading

binghe commented Feb 7, 2024 •

edited

Loading

someplaceguy commented Feb 8, 2024 •

edited

Loading

someplaceguy commented Feb 8, 2024 •

edited

Loading

someplaceguy commented Feb 15, 2024 •

edited

Loading