-
Hi Kaiyu, I am unsure if this is out of scope of your original work but I was curious why this issue does not happen in CoqGym's data set. My guess is that it doesn't happen in your data set, or perhaps an intrinsic property of Coq is that everything is created from selected predefined building blocks or in a similarly we are only allowed to use 128 ascii characters in Coq/CoqGym, or something else...? In short, why doesn't ASTactic throw an actual error (not just a "bad" embedding) when faced with unknown symbols? Or why does this problem not show up in the test set? For reference, related issue:
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
That's a great question. We don't have to handle out-of-vocabulary tokens because we only use the token's type when encoding a term. For example, if a term is This is of course a very rudimentary solution, and it should be possible to do better by taking tokens themselves into account. |
Beta Was this translation helpful? Give feedback.
That's a great question. We don't have to handle out-of-vocabulary tokens because we only use the token's type when encoding a term. For example, if a term is
x + y = 3
, we encode it in a way analogous toint variable + int variable = int constant
. So we can handle novel cases such asP + Q = 4
.This is of course a very rudimentary solution, and it should be possible to do better by taking tokens themselves into account.