-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Drastic performance improvement and two new options (WIP!) #35
Conversation
src/reify.ml4
Outdated
Goptions.optkey = ["Template";"Share";"Strings"]; | ||
Goptions.optread = (fun () -> !share_strings); | ||
Goptions.optwrite = (fun a -> share_strings:=a); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason to not do this? Why export it as an option if it is always strictly better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe expose the option to let-bind strings, since that would change something like [quote (quote a)].
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed it should be a net improvement always. What will quote (quote a) do? Let-binding all the string literals is doable but a bit more complicated I think, I'll leave that as another improvement. I'll first finish this by making all Quote Definitions not go through pretyping.
src/reify.ml4
Outdated
with Not_found -> | ||
let qs = quote_string s in | ||
Hashtbl.add string_hash s qs; qs | ||
else quote_string s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't share substrings? Would that improve things as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not so much I think, at least naturally we would share suffixes not prefixes due to the inductive nature of strings in Coq.
Thanks for looking into this! This might partially address #11. |
This is great! The former method may not result in any advantage because whenever any non-trivial function on I wonder how much of the let-binding advantage can be gained by other methods, such as a more efficient string representation, e.g. one where 1 or more characters take only one word on a machine. It seems that currently, 1 |
"abcd" ++ "abcd" Without hashconsing, this uses a lot of memory. With hashconsing it uses less. However, when this term is type checked, the shared "abcd" is still type checked twice. Using a Gallina let x := "abcd" in x ++ x and (fun x => x ++ x) "abcd" @mattam82 should correct me if I'm wrong, I believe that the second will be (very slightly) more efficient to type check because Coq can check the body of the function with respect to an opaque value
@mattam82 This hash-consing is local to this plugin only. This prevents multiple plugins from sharing the hashconsing cache. Does Coq do internal hashconsing on terms? I had wanted to do this in PluginUtils, but since I made the term construction module into a functor I lost cross-plugin cache sharing.
If we're worried about type checking time, then we could try to play the |
Alternate idea for making |
Personally, I don't like this sort of side effecting and relying on names because it makes things fragile in the (unlikely) event that someone writes a definition for one of those names, but sets it equal to something else (or worse, has a different type). In the later case (different type) if we are inserting our terms directly into the kernel (unsure if this is what @mattam82 s patch does), then we would be breaking Coq in a pretty serious way. |
I agree, we shouldn't rely on the names. I'm suggesting that, if the name already exists, we check to see if it has the right body (and memoize the check, if it's a bottleneck). If it doesn't, either fail, or generate a new name by appending a digit, or just inline the body in that case. |
Really? Not to say that this is a good thing, but Coq already admits definitions which don't type check when you run them through CoInductive Stream := cons : nat -> Stream -> Stream.
Notation zeros := (cofix zeros : Stream := cons 0 zeros).
Definition force (s : Stream) : Stream := match s with cons x y => cons x y end.
Definition eq_force s : s = force s := match s with cons x y => eq_refl end.
Inductive foo := bar (_ : zeros = force zeros).
Definition eq_zeros := bar (eq_force zeros).
Fail Definition eq_zeros' := Eval cbv in eq_zeros.
(* The command has indeed failed with message:
Illegal application:
The term "bar" of type "zeros = force zeros -> foo"
cannot be applied to the term
"eq_refl" : "cons 0 zeros = cons 0 zeros"
This term has type "cons 0 zeros = cons 0 zeros" which should be coercible to "zeros = force zeros". *) I've yet to see anyone spin this into a proof of |
and rediscovered -- the latter often fails. Copied code from Matthieu's gmalecha#35
I think the proper way out of this inefficiency of strings will come in the form of native arrays and ints @maximedenes is working on. Indeed the let-bindings sharing would be too fragile. |
Share string by hashconsing the string objects (reduces memory consumption by half on the Gcd example of certicoq). We do it for whole idents right now, the current Coq string representation does not allow to share prefixes sadly. Also remove the externalisation + elaboration phases when introducing quoted terms, we can bypass it completely and give the term directly for the kernel to check (without bypassing any checks though!). There is still an inefficiency due to retypechecking strings which does not seem so easy to avoid. We went from 240s to 24s when quoting CertiCoq's Gcd example.
I cleaned-up my patch, adapting it to all the Quote vernaculars, rebased and retested against the test-suite and certicoq. Ready to merge now. |
Share Strings and Timing.
Share Strings hashconses the string objects (should be improved to do it
on separate parts of qualified identifiers, still reduces memory
consumption by half on the Gcd example of certicoq)
Timing prints timing information about the different phases of quoting,
helped me figure out that the externalisation + elaboration phases were
used when introducing a quoted term, we can bypass it completely and
give the term directly for the kernel to check. Still there is probably
some room for improvement, like let-binding the strings somehow
(we went from 240s to 24s on CertiCoq's Gcd example).
This PR is still WIP, all quoting functions have to be adapted, and I'm wondering if we should keep the Timing option or not? Incidentally, it fixed a bug in quoting which was using the wrong tInd.