Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drastic performance improvement and two new options (WIP!) #35

Merged
merged 1 commit into from
Apr 10, 2017

Conversation

mattam82
Copy link
Collaborator

Share Strings and Timing.

Share Strings hashconses the string objects (should be improved to do it
on separate parts of qualified identifiers, still reduces memory
consumption by half on the Gcd example of certicoq)
Timing prints timing information about the different phases of quoting,
helped me figure out that the externalisation + elaboration phases were
used when introducing a quoted term, we can bypass it completely and
give the term directly for the kernel to check. Still there is probably
some room for improvement, like let-binding the strings somehow
(we went from 240s to 24s on CertiCoq's Gcd example).

This PR is still WIP, all quoting functions have to be adapted, and I'm wondering if we should keep the Timing option or not? Incidentally, it fixed a bug in quoting which was using the wrong tInd.

src/reify.ml4 Outdated
Goptions.optkey = ["Template";"Share";"Strings"];
Goptions.optread = (fun () -> !share_strings);
Goptions.optwrite = (fun a -> share_strings:=a);
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason to not do this? Why export it as an option if it is always strictly better?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe expose the option to let-bind strings, since that would change something like [quote (quote a)].

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed it should be a net improvement always. What will quote (quote a) do? Let-binding all the string literals is doable but a bit more complicated I think, I'll leave that as another improvement. I'll first finish this by making all Quote Definitions not go through pretyping.

src/reify.ml4 Outdated
with Not_found ->
let qs = quote_string s in
Hashtbl.add string_hash s qs; qs
else quote_string s
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't share substrings? Would that improve things as well?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not so much I think, at least naturally we would share suffixes not prefixes due to the inductive nature of strings in Coq.

@gmalecha
Copy link
Owner

Thanks for looking into this! This might partially address #11.

@aa755
Copy link

aa755 commented Dec 14, 2016

This is great!
I am not sure which kind of let-binding you have in mind -- whether the let is a Gallina let, or something reified somehow in TemplateAst.term?.

The former method may not result in any advantage because whenever any non-trivial function on Ast.term executes, all the let bindings will reduce away into bloat. The latter method may make it harder to write functions that process an Ast.term.

I wonder how much of the let-binding advantage can be gained by other methods, such as a more efficient string representation, e.g. one where 1 or more characters take only one word on a machine. It seems that currently, 1 Coq.Strings.Ascii.ascii takes 8 words.
Because the names of constants in Coq are typically not so long, the difference in space complexity may just be in these constant factors.

@gmalecha
Copy link
Owner

I am not sure which kind of let-binding you have in mind -- whether the let is a Gallina let, or something reified somehow in TemplateAst.term?.
The thought was a Gallina let (even possibly a lambda let). The reason that this is beneficial is that it does a little bit more than hash consing. Consider the term:

"abcd" ++ "abcd"

Without hashconsing, this uses a lot of memory. With hashconsing it uses less. However, when this term is type checked, the shared "abcd" is still type checked twice. Using a Gallina let would make this term be type checked once and referenced twice. There are two ways to get the behavior:

let x := "abcd" in x ++ x

and

(fun x => x ++ x) "abcd"

@mattam82 should correct me if I'm wrong, I believe that the second will be (very slightly) more efficient to type check because Coq can check the body of the function with respect to an opaque value x whereas with the former it uses a transparent variable x.

The former method may not result in any advantage because whenever any non-trivial function on Ast.term executes, all the let bindings will reduce away into bloat. The latter method may make it harder to write functions that process an Ast.term.
Agreed. @JasonGross is the only person I know of who has tried to quote a quoted term. This is why it would probably be better to handle this via a flag (but hashconsing should not be flag controlled in the end).

@mattam82 This hash-consing is local to this plugin only. This prevents multiple plugins from sharing the hashconsing cache. Does Coq do internal hashconsing on terms? I had wanted to do this in PluginUtils, but since I made the term construction module into a functor I lost cross-plugin cache sharing.

I wonder how much of the let-binding advantage can be gained by other methods, such as a more efficient string representation, e.g. one where 1 or more characters take only one word on a machine. It seems that currently, 1 Coq.Strings.Ascii.ascii takes 8 words.
Because the names of constants in Coq are typically not so long, the difference in space complexity may just be in these constant factors.
Since ascii is also hashconsed in @mattam82's patch, we will likely only ever have ~50 asciis in memory. That shouldn't be too bad (a packed ascii could be useful but since it wouldn't play nice with anything else, it might be awkward).

If we're worried about type checking time, then we could try to play the let binding trick above (if it is actually an improvement).

@JasonGross
Copy link
Contributor

Alternate idea for making quote (quote x) reasonable: how about an option that adds a bunch of constants to the environment if they don't already exist, like Coq_Init_Logic_eq_identifier := "Coq.Init.Logic.eq", etc, so that each use of an identifier is only a constant.

@gmalecha
Copy link
Owner

Personally, I don't like this sort of side effecting and relying on names because it makes things fragile in the (unlikely) event that someone writes a definition for one of those names, but sets it equal to something else (or worse, has a different type). In the later case (different type) if we are inserting our terms directly into the kernel (unsure if this is what @mattam82 s patch does), then we would be breaking Coq in a pretty serious way.

@JasonGross
Copy link
Contributor

I agree, we shouldn't rely on the names. I'm suggesting that, if the name already exists, we check to see if it has the right body (and memoize the check, if it's a bottleneck). If it doesn't, either fail, or generate a new name by appending a digit, or just inline the body in that case.

@JasonGross
Copy link
Contributor

then we would be breaking Coq in a pretty serious way.

Really? Not to say that this is a good thing, but Coq already admits definitions which don't type check when you run them through cbv (h/t Andreas Abel's talk):

CoInductive Stream := cons : nat -> Stream -> Stream.
Notation zeros := (cofix zeros  : Stream := cons 0 zeros).
Definition force (s : Stream) : Stream := match s with cons x y => cons x y end.
Definition eq_force s : s = force s := match s with cons x y => eq_refl end.
Inductive foo := bar (_ : zeros = force zeros).
Definition eq_zeros := bar (eq_force zeros).
Fail Definition eq_zeros' := Eval cbv in eq_zeros.
(* The command has indeed failed with message:
Illegal application:
The term "bar" of type "zeros = force zeros -> foo"
cannot be applied to the term
 "eq_refl" : "cons 0 zeros = cons 0 zeros"
This term has type "cons 0 zeros = cons 0 zeros" which should be coercible to "zeros = force zeros". *)

I've yet to see anyone spin this into a proof of False (although maybe now that I've said that @maximedenes will come along and prove me wrong).

aa755 added a commit to aa755/template-coq that referenced this pull request Jan 22, 2017
and rediscovered -- the latter often fails.
Copied code from Matthieu's
gmalecha#35
@mattam82
Copy link
Collaborator Author

I think the proper way out of this inefficiency of strings will come in the form of native arrays and ints @maximedenes is working on. Indeed the let-bindings sharing would be too fragile.

Share string by hashconsing the string objects (reduces memory
consumption by half on the Gcd example of certicoq). We do it for whole
idents right now, the current Coq string representation does not allow
to share prefixes sadly.

Also remove the externalisation + elaboration phases when introducing
quoted terms, we can bypass it completely and give the term directly for
the kernel to check (without bypassing any checks though!).
There is still an inefficiency due to retypechecking strings which does
not seem so easy to avoid.
We went from 240s to 24s when quoting CertiCoq's Gcd example.
@mattam82
Copy link
Collaborator Author

I cleaned-up my patch, adapting it to all the Quote vernaculars, rebased and retested against the test-suite and certicoq. Ready to merge now.

@mattam82 mattam82 merged commit c70cfe8 into gmalecha:coq-8.5 Apr 10, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants