How does ASTactic deal with novel symbols? #24

brando90 · 2021-01-06T16:17:31Z

brando90
Jan 6, 2021

Hi Kaiyu,

I am unsure if this is out of scope of your original work but I was curious why this issue does not happen in CoqGym's data set.
In mathematics, it is common (as far as I know) to come up with new symbols (and abstractions). For example, a new symbol that is not related to anything else but has a collection of operations with other related symbols. If an unseen symbol like this one comes up in the data set how would ASTactic even create an embedding for it during test time?

My guess is that it doesn't happen in your data set, or perhaps an intrinsic property of Coq is that everything is created from selected predefined building blocks or in a similarly we are only allowed to use 128 ascii characters in Coq/CoqGym, or something else...?

In short, why doesn't ASTactic throw an actual error (not just a "bad" embedding) when faced with unknown symbols? Or why does this problem not show up in the test set?

For reference, related issue:

question on how words/symbols are mapped to vectors, in short nonterminals are mapped to one hot vectors (How is the mapping of individual words (or symbols) to vectors is done #15)

Answered by yangky11

Jan 7, 2021

That's a great question. We don't have to handle out-of-vocabulary tokens because we only use the token's type when encoding a term. For example, if a term is x + y = 3, we encode it in a way analogous to int variable + int variable = int constant. So we can handle novel cases such as P + Q = 4.

This is of course a very rudimentary solution, and it should be possible to do better by taking tokens themselves into account.

View full answer

yangky11 · 2021-01-07T04:33:41Z

yangky11
Jan 7, 2021
Maintainer

That's a great question. We don't have to handle out-of-vocabulary tokens because we only use the token's type when encoding a term. For example, if a term is x + y = 3, we encode it in a way analogous to int variable + int variable = int constant. So we can handle novel cases such as P + Q = 4.

This is of course a very rudimentary solution, and it should be possible to do better by taking tokens themselves into account.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How does ASTactic deal with novel symbols? #24

{{title}}

Replies: 1 comment

{{title}}

Select a reply

How does ASTactic deal with novel symbols? #24

brando90 Jan 6, 2021

Replies: 1 comment

yangky11 Jan 7, 2021 Maintainer

brando90
Jan 6, 2021

yangky11
Jan 7, 2021
Maintainer