-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to print multiple ASTs for GLR parser correctly? #88
Comments
It is true that in the case of GLR parsing Hime maintains a collection of concurrent parse trees (Hime use Shared Packed Parse Forest, SPPF) while there are ambiguities, but only one version is returned in the end, arbitrarily the first found. From a user perspective, the result returned by Hime only represent this single AST at this time. If returning the full SPPF is desirable, we could hammer something out, the data is there, it's a question of exposing it. |
Yes, it is desirable, consider this example: int main() {
int T, a, *z, y;
T * a;
int x = y / *z;
} As you could see I have an idea that we could iterate the "AST"s, maintain another stack to store the type information, and delete unfavorable AST and finally come up with a valid representation. int main() {
int T, a, *z, y; // KNOWN: T, a, y is int, z is int*
T * a; // Two ASTs here: (multiply T a) and (ptr-decl T a), but we know the latter is impossible due to the symbol T being int.
int x = y / *z;
} So until Hime can have runtime feedback to the parsing state, we need the forest so that we can further prune out invalid trees after parsing using GLR, and do so during semantic verification phase. Do note this phase can also be done in parallel. This also seems to be done by Elkhound as well |
I tried the new SPPF production rules, it seems like it does not work well with repeat statement?
Gives only one version (declaration derivation), but this:
Correctly gives two versions. Here's my optimized version of the C grammar if you want:
|
Yes, there still are some issues to iron out. Thing is, the previous code was geared toward the construction of a single AST so the way tree manipulations (promotion primarily with In your example, with |
By the way, here is the code which is able to print the different versions of the SPPF: use hime_redist::{ast::AstNode, sppf::SppfNode};
fn print_crossings(crossings: &[bool]) {
let mut i = 0;
if !crossings.is_empty() {
while i < crossings.len() - 1 {
print!("{:}", if crossings[i] { "| " } else { " " });
i += 1;
}
print!("+-> ");
}
}
fn print_sppf(node: SppfNode, crossings: &[bool]) {
if node.versions_count() == 1 {
print_single_version(node, crossings);
} else {
print_multi_versions(node, crossings);
}
}
fn print_multi_versions(node: SppfNode, crossings: &[bool]) {
print_crossings(crossings);
println!("<versions>");
for (index, version) in node.versions().into_iter().enumerate() {
let mut crossings = crossings.to_owned();
crossings.push(index < node.versions_count() - 1);
print_crossings(&crossings);
println!("{version}");
let mut i = 0;
let children = version.children();
while i < children.len() {
let mut child_crossings = crossings.to_owned();
child_crossings.push(i < children.len() - 1);
print_sppf(children.at(i), &child_crossings);
i += 1;
}
}
}
fn print_single_version(node: SppfNode, crossings: &[bool]) {
print_crossings(crossings);
let version = node.first_version();
println!("{version}");
let mut i = 0;
let children = version.children();
while i < children.len() {
let mut child_crossings = crossings.to_owned();
child_crossings.push(i < children.len() - 1);
print_sppf(children.at(i), &child_crossings);
i += 1;
}
} |
I just used a simple modification by tracking the version number fn print(node: SppfNode, crossings: &[bool]) {
let mut version = 1;
for node in node.versions() {
let mut i = 0;
if !crossings.is_empty() {
while i < crossings.len() - 1 {
print!("{:}", if crossings[i] { "| " } else { " " });
i += 1;
}
print!("+-{version}-> ");
}
println!("{node}");
i = 0;
let children = node.children();
while i < children.len() {
let mut child_crossings = crossings.to_owned();
child_crossings.push(i < children.len() - 1);
print(children.at(i), &child_crossings);
i += 1;
}
version += 1;
}
} |
@woutersl by the way, what need to be done to implement the "forest walker" (in contrast to the AST walker) as well? Are there any paper describing the technique to walk this kind of forest? Not sure if I could help but I have no clues if I'm not having any materials at the deck in the moment (I mean I'm not a student anymore already for a year haha). But of course, the easiest treat is to keep using a depth-first approach to generate a bunch of spanning trees and try each tree and see if anyone of them terminates. This really makes me reminisce the parallel the likes between NFA and DFA over this approach. In fact, I'm not sure if the GLR parser is also inspired by this. You just make the LR parser "forkable" at the current state, just like NFA is also "forkable" to try different transitions that terminates/matches. I have another obscure idea I can't express clearly (I'm not sure if that's novel or not) that if we want to implement semantic checker to multiple "trees"/"branches" of the SPPF at the same time, at the very least, the data structure state holding the information about the current AST walker must be immutable -- which means data is cloned between each forest transition. Then we will need to use persistent data structure/Copy-On-Write mechanism/transactional memory to manage the state data. While I'm pretty sure functional programming people may have enjoyed writing parsers this way because their way to handle data structures are immutable by design, so this is trivial for all kinds of program, but this idea came back to the time when I was still using rust-peg (kevinmehall/rust-peg#301) Oh I also spotted another problem, consider this C++ code: int f() {
return g(123);
}
int g(int x) {
return x + 42;
} This may look like a C code but it is only valid in C++, in C if you do not have forward declarations, you will assume a "bare function prototype" -- which means the function g is treated as So, to solve this problem, should we also admit combinations of the forests? Although I'm pretty sure this naive approach would lead to combinatorial explosion. |
@stevefan1999-personal I just pushed code that should fix the remaining issues with SPPF construction. I checked your example with C code and it indeed produces two versions now.
The reasoning is that doing so, the nodes with multiple versions will be |
I'm trying to implement a C parser using hime based on the grammar here (https://www.quut.com/c/ANSI-C-grammar-y-2011.html) and here (https://www.quut.com/c/ANSI-C-grammar-l-2011.html), and for this particular input:
There should be multiple parsing trees (a primary expression T multiply pointer dereference of "
hello
" or a double pointer declaration of type T named "hello
"), but I realized that the default function to print AST:Just prints the latter which is type definition. This is correct but only part of the truth. Maybe this function is not sufficient?
C grammar:
The text was updated successfully, but these errors were encountered: