Drastic performance improvement and two new options (WIP!) #35

mattam82 · 2016-12-13T14:28:19Z

Share Strings and Timing.

Share Strings hashconses the string objects (should be improved to do it
on separate parts of qualified identifiers, still reduces memory
consumption by half on the Gcd example of certicoq)
Timing prints timing information about the different phases of quoting,
helped me figure out that the externalisation + elaboration phases were
used when introducing a quoted term, we can bypass it completely and
give the term directly for the kernel to check. Still there is probably
some room for improvement, like let-binding the strings somehow
(we went from 240s to 24s on CertiCoq's Gcd example).

This PR is still WIP, all quoting functions have to be adapted, and I'm wondering if we should keep the Timing option or not? Incidentally, it fixed a bug in quoting which was using the wrong tInd.

gmalecha · 2016-12-13T14:31:36Z

src/reify.ml4

+  Goptions.optkey = ["Template";"Share";"Strings"];
+  Goptions.optread = (fun () -> !share_strings);
+  Goptions.optwrite = (fun a -> share_strings:=a);
+}


Is there any reason to not do this? Why export it as an option if it is always strictly better?

Maybe expose the option to let-bind strings, since that would change something like [quote (quote a)].

Indeed it should be a net improvement always. What will quote (quote a) do? Let-binding all the string literals is doable but a bit more complicated I think, I'll leave that as another improvement. I'll first finish this by making all Quote Definitions not go through pretyping.

gmalecha · 2016-12-13T14:36:44Z

src/reify.ml4

+      with Not_found ->
+        let qs = quote_string s in
+        Hashtbl.add string_hash s qs; qs
+    else quote_string s


This doesn't share substrings? Would that improve things as well?

Not so much I think, at least naturally we would share suffixes not prefixes due to the inductive nature of strings in Coq.

gmalecha · 2016-12-13T14:55:45Z

Thanks for looking into this! This might partially address #11.

aa755 · 2016-12-14T01:14:18Z

This is great!
I am not sure which kind of let-binding you have in mind -- whether the let is a Gallina let, or something reified somehow in TemplateAst.term?.

The former method may not result in any advantage because whenever any non-trivial function on Ast.term executes, all the let bindings will reduce away into bloat. The latter method may make it harder to write functions that process an Ast.term.

I wonder how much of the let-binding advantage can be gained by other methods, such as a more efficient string representation, e.g. one where 1 or more characters take only one word on a machine. It seems that currently, 1 Coq.Strings.Ascii.ascii takes 8 words.
Because the names of constants in Coq are typically not so long, the difference in space complexity may just be in these constant factors.

gmalecha · 2016-12-14T03:11:49Z

I am not sure which kind of let-binding you have in mind -- whether the let is a Gallina let, or something reified somehow in TemplateAst.term?.
The thought was a Gallina let (even possibly a lambda let). The reason that this is beneficial is that it does a little bit more than hash consing. Consider the term:

"abcd" ++ "abcd"

Without hashconsing, this uses a lot of memory. With hashconsing it uses less. However, when this term is type checked, the shared "abcd" is still type checked twice. Using a Gallina let would make this term be type checked once and referenced twice. There are two ways to get the behavior:

let x := "abcd" in x ++ x

and

(fun x => x ++ x) "abcd"

@mattam82 should correct me if I'm wrong, I believe that the second will be (very slightly) more efficient to type check because Coq can check the body of the function with respect to an opaque value x whereas with the former it uses a transparent variable x.

The former method may not result in any advantage because whenever any non-trivial function on Ast.term executes, all the let bindings will reduce away into bloat. The latter method may make it harder to write functions that process an Ast.term.
Agreed. @JasonGross is the only person I know of who has tried to quote a quoted term. This is why it would probably be better to handle this via a flag (but hashconsing should not be flag controlled in the end).

@mattam82 This hash-consing is local to this plugin only. This prevents multiple plugins from sharing the hashconsing cache. Does Coq do internal hashconsing on terms? I had wanted to do this in PluginUtils, but since I made the term construction module into a functor I lost cross-plugin cache sharing.

I wonder how much of the let-binding advantage can be gained by other methods, such as a more efficient string representation, e.g. one where 1 or more characters take only one word on a machine. It seems that currently, 1 Coq.Strings.Ascii.ascii takes 8 words.
Because the names of constants in Coq are typically not so long, the difference in space complexity may just be in these constant factors.
Since ascii is also hashconsed in @mattam82's patch, we will likely only ever have ~50 asciis in memory. That shouldn't be too bad (a packed ascii could be useful but since it wouldn't play nice with anything else, it might be awkward).

If we're worried about type checking time, then we could try to play the let binding trick above (if it is actually an improvement).

JasonGross · 2016-12-14T03:20:19Z

Alternate idea for making quote (quote x) reasonable: how about an option that adds a bunch of constants to the environment if they don't already exist, like Coq_Init_Logic_eq_identifier := "Coq.Init.Logic.eq", etc, so that each use of an identifier is only a constant.

gmalecha · 2016-12-14T03:32:33Z

Personally, I don't like this sort of side effecting and relying on names because it makes things fragile in the (unlikely) event that someone writes a definition for one of those names, but sets it equal to something else (or worse, has a different type). In the later case (different type) if we are inserting our terms directly into the kernel (unsure if this is what @mattam82 s patch does), then we would be breaking Coq in a pretty serious way.

JasonGross · 2016-12-14T03:39:04Z

I agree, we shouldn't rely on the names. I'm suggesting that, if the name already exists, we check to see if it has the right body (and memoize the check, if it's a bottleneck). If it doesn't, either fail, or generate a new name by appending a digit, or just inline the body in that case.

JasonGross · 2016-12-14T03:59:42Z

then we would be breaking Coq in a pretty serious way.

Really? Not to say that this is a good thing, but Coq already admits definitions which don't type check when you run them through cbv (h/t Andreas Abel's talk):

CoInductive Stream := cons : nat -> Stream -> Stream.
Notation zeros := (cofix zeros  : Stream := cons 0 zeros).
Definition force (s : Stream) : Stream := match s with cons x y => cons x y end.
Definition eq_force s : s = force s := match s with cons x y => eq_refl end.
Inductive foo := bar (_ : zeros = force zeros).
Definition eq_zeros := bar (eq_force zeros).
Fail Definition eq_zeros' := Eval cbv in eq_zeros.
(* The command has indeed failed with message:
Illegal application:
The term "bar" of type "zeros = force zeros -> foo"
cannot be applied to the term
 "eq_refl" : "cons 0 zeros = cons 0 zeros"
This term has type "cons 0 zeros = cons 0 zeros" which should be coercible to "zeros = force zeros". *)

I've yet to see anyone spin this into a proof of False (although maybe now that I've said that @maximedenes will come along and prove me wrong).

and rediscovered -- the latter often fails. Copied code from Matthieu's gmalecha#35

mattam82 · 2017-02-13T23:34:40Z

I think the proper way out of this inefficiency of strings will come in the form of native arrays and ints @maximedenes is working on. Indeed the let-bindings sharing would be too fragile.

Share string by hashconsing the string objects (reduces memory consumption by half on the Gcd example of certicoq). We do it for whole idents right now, the current Coq string representation does not allow to share prefixes sadly. Also remove the externalisation + elaboration phases when introducing quoted terms, we can bypass it completely and give the term directly for the kernel to check (without bypassing any checks though!). There is still an inefficiency due to retypechecking strings which does not seem so easy to avoid. We went from 240s to 24s when quoting CertiCoq's Gcd example.

mattam82 · 2017-02-13T23:53:52Z

I cleaned-up my patch, adapting it to all the Quote vernaculars, rebased and retested against the test-suite and certicoq. Ready to merge now.

gmalecha reviewed Dec 13, 2016

View reviewed changes

aa755 added a commit to aa755/template-coq that referenced this pull request Jan 22, 2017

no "externalization" after unquoting. implicits are now not erased

13f25ce

and rediscovered -- the latter often fails. Copied code from Matthieu's gmalecha#35

mattam82 force-pushed the coq-8.5 branch from eaac4eb to 92a2ff7 Compare February 13, 2017 23:46

mattam82 force-pushed the coq-8.5 branch from 92a2ff7 to ddcf94e Compare February 13, 2017 23:48

mattam82 merged commit c70cfe8 into gmalecha:coq-8.5 Apr 10, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Drastic performance improvement and two new options (WIP!) #35

Drastic performance improvement and two new options (WIP!) #35

mattam82 commented Dec 13, 2016

gmalecha Dec 13, 2016

gmalecha Dec 13, 2016

mattam82 Feb 13, 2017

gmalecha Dec 13, 2016

mattam82 Feb 13, 2017

gmalecha commented Dec 13, 2016

aa755 commented Dec 14, 2016

gmalecha commented Dec 14, 2016

JasonGross commented Dec 14, 2016

gmalecha commented Dec 14, 2016

JasonGross commented Dec 14, 2016

JasonGross commented Dec 14, 2016

mattam82 commented Feb 13, 2017

mattam82 commented Feb 13, 2017

Drastic performance improvement and two new options (WIP!) #35

Drastic performance improvement and two new options (WIP!) #35

Conversation

mattam82 commented Dec 13, 2016

gmalecha Dec 13, 2016

Choose a reason for hiding this comment

gmalecha Dec 13, 2016

Choose a reason for hiding this comment

mattam82 Feb 13, 2017

Choose a reason for hiding this comment

gmalecha Dec 13, 2016

Choose a reason for hiding this comment

mattam82 Feb 13, 2017

Choose a reason for hiding this comment

gmalecha commented Dec 13, 2016

aa755 commented Dec 14, 2016

gmalecha commented Dec 14, 2016

JasonGross commented Dec 14, 2016

gmalecha commented Dec 14, 2016

JasonGross commented Dec 14, 2016

JasonGross commented Dec 14, 2016

mattam82 commented Feb 13, 2017

mattam82 commented Feb 13, 2017