Skip to content

Commit

Permalink
Split project into multiple sections.
Browse files Browse the repository at this point in the history
  • Loading branch information
Gohla committed Dec 22, 2023
1 parent 8578db6 commit a252407
Show file tree
Hide file tree
Showing 33 changed files with 525 additions and 516 deletions.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
77 changes: 77 additions & 0 deletions src/4_example/1_grammar/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# Compiling Grammars and Parsing

First we will implement compilation of pest grammars, and parsing text with a compiled grammar.
A [pest grammar](https://pest.rs/book/grammars/peg.html) contains named rules that describe how to parse something.
For example, `number = { ASCII_DIGIT+ }` means that a `number` is parsed by parsing 1 or more `ASCII_DIGIT`, with `ASCII_DIGIT` being a builtin rule that parses ASCII numbers 0-9.

Add the following dev-dependencies to `pie/Cargo.toml`:

```diff2html linebyline
{{#include ../../gen/4_example/1_grammar/a_1_Cargo.toml.diff}}
```

- [pest](https://crates.io/crates/pest) is the library for parsing with pest grammars.
- [pest_meta](https://crates.io/crates/pest_meta) validates, optimises, and compiles pest grammars.
- [pest_vm](https://crates.io/crates/pest_vm) provides parsing with a compiled pest grammar, without having to generate Rust code for grammars, enabling interactive use.

Create the `pie/examples/parser_dev/main.rs` file and add an empty main function to it:

```rust,
{{#include a_2_main.rs}}
```

Confirm the example can be run with `cargo run --example parser_dev`.

Let's implement the pest grammar compiler and parser.
Add `parse` as a public module to `pie/examples/parser_dev/main.rs`:

```diff2html linebyline
{{#include ../../gen/4_example/1_grammar/a_3_main_parse_mod.rs.diff}}
```

We will add larger chunks of code from now on, compared to the rest of the tutorial, to keep things going.
Create the `pie/examples/parser_dev/parse.rs` file and add to it:

```rust,
{{#include a_4_grammar.rs}}
```

The `CompiledGrammar` struct contains a parsed pest grammar, consisting of a `Vec` of optimised parsing rules, and a hash set of rule names.
We will use this struct as an output of a task in the future, so we derive `Clone`, `Eq`, and `Debug`.

The `new` function takes text of a pest grammar, and an optional file path for error reporting, and creates a `CompilerGrammar` or an error in the form of a `String`.
We're using `String`s as errors in this example for simplicity.

We compile the grammar with `pest_meta::parse_and_optimize`.
If successful, we gather the rule names into a hash set and return a `CompiledGrammar`.
If not, multiple errors are returned, which are first preprocessed with `with_path` and `renamed_rules`, and then written to a single String with `writeln!`, which is returned as the error.

Now we implement parsing using a `CompiledGrammar`.
Add the `parse` method to `pie/examples/parser_dev/parse.rs`:

```diff2html linebyline
{{#include ../../gen/4_example/1_grammar/a_5_parse.rs.diff}}
```

`parse` takes the text of the program to parse, the rule name to start parsing with, and an optional file path for error reporting.

We first check whether `rule_name` exists by looking for it in `self.rule_names`, and return an error if it does not exist.
We have to do this because `pest_vm` panics when the rule name does not exist, which would kill the entire program.

If the rule name is valid, we create a `pest_vm::Vm` and `parse`.
If successful, we get a `pairs` iterator that describes how the program was parsed, which are typically used to [create an Abstract Syntax Tree (AST) in Rust code](https://pest.rs/book/examples/json.html#ast-generation).
However, for simplicity we just format the pairs as a `String` and return that.
If not successful, we do the same as the previous function, but instead for 1 error instead of multiple.

Unfortunately we cannot store `pest_vm::Vm` in `CompiledGrammar`, because `Vm` does not implement `Clone` nor `Eq`.
Therefore, we have to create a new `Vm` every time we parse, which has a small performance overhead, but that is fine for this example.

To check whether this code does what we want, we'll write a test for it (yes, you can add tests to examples in Rust!).
Add to `pie/examples/parser_dev/parse.rs`:

```rust,
{{#include a_6_test.rs:2:}}
```

We test grammar compilation failure and success, and parse failure and success.
Run this test with `cargo test --example parser_dev -- --show-output`, which also shows what the returned `String`s look like.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
50 changes: 50 additions & 0 deletions src/4_example/2_task/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Task Implementation

Now we'll implement tasks for compiling a grammar and parsing.
Add `task` as a public module to `pie/examples/parser_dev/main.rs`:

```diff2html linebyline
{{#include ../../gen/4_example/2_task/b_1_main_task_mod.rs.diff}}
```

Create the `pie/examples/parser_dev/task.rs` file and add to it:

```rust,
{{#include b_2_tasks_outputs.rs}}
```

We create a `Tasks` enum with:

- A `CompileGrammar` variant for compiling a grammar from a file.
- A `Parse` variant that uses the compiled grammar returned from another task to parse a program in a file, starting parsing with a specific rule given by name.

`compile_grammar` and `parse` are convenience functions for creating these variants.
We derive `Clone`, `Eq`, `Hash` and `Debug` as these are required for tasks.

We create an `Outputs` enum for storing the results of these tasks, and derive the required traits.

Since both tasks will require a file, and we're using `String`s as errors, we will implement a convenience function for this.
Add to `pie/examples/parser_dev/task.rs`:

```rust,
{{#include b_3_require_file.rs:2:}}
```

`require_file_to_string` is like `context.require_file`, but converts all errors to `String`.

Now we implement `Task` for `Tasks`.
Add to `pie/examples/parser_dev/task.rs`:

```rust,
{{#include b_4_task.rs:2:}}
```

The output is `Result<Outputs, String>`: either an `Outputs` if the task succeeds, or a `String` if not.
In `execute` we match our variant and either compile a grammar or parse, which are mostly straightforward.
In the `Parse` variant, we require the compile grammar task, but don't propagate its errors and instead return `Ok(Outputs::Parsed(None))`.
We do this to prevent duplicate errors.
If we propagated the error, the grammar compilation error would be duplicated into every parse task.

Confirm the code compiles with `cargo build --example parser_dev`.
We won't test this code as we'll use these tasks in the `main` function next.

File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
98 changes: 98 additions & 0 deletions src/4_example/3_cli/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,98 @@
# CLI for Incremental Batch Builds

We have tasks for compiling grammars and parsing files, but we need to pass file paths and a rule name into these tasks.
We will pass this data to the program via command-line arguments.
To parse command-line arguments, we will use [clap](https://docs.rs/clap/latest/clap/), which is an awesome library for easily parsing command-line arguments.
Add clap as a dependency to `pie/Cargo.toml`:

```diff2html linebyline
{{#include ../../gen/4_example/3_cli/c_1_Cargo.toml.diff}}
```

We're using the `derive` feature of clap to automatically derive a full-featured argument parser from a struct.
Modify `pie/examples/parser_dev/main.rs`:

```diff2html
{{#include ../../gen/4_example/3_cli/c_2_cli.rs.diff}}
```

The `Args` struct contains exactly the data we need: the path to the grammar file, the name of the rule to start parsing with, and paths to program files to parse.
We derive an argument parser for `Args` with `#[derive(Parser)]`.
Then we parse command-line arguments in `main` with `Args::parse()`.

Test this program with `cargo run --example parser_dev -- --help`, which should result in usage help for the program.
Note that the names, ordering, and doc-comments of the fields are used to generate this help.
You can test out several more commands:

- `cargo run --example parser_dev --`
- `cargo run --example parser_dev -- foo`
- `cargo run --example parser_dev -- foo bar`
- `cargo run --example parser_dev -- foo bar baz qux`

Now let's use these arguments to actually compile the grammar and parse example program files.
Modify `pie/examples/parser_dev/main.rs`:

```diff2html
{{#include ../../gen/4_example/3_cli/c_3_compile_parse.rs.diff}}
```

In `compile_grammar_and_parse`, we create a new `Pie` instance that writes the build log to stderr, and create a new build session.
Then, we require a compile grammar task using the `grammar_file_path` from `Args`, and write any errors to the `errors` `String`.
We then require a parse task for every path in `args.program_file_paths`, using the previously created `compile_grammar_task` and `args.rule_name`.
Successes are printed to stdout and errors are written to `errors`.
Finally, we print `errors` to stdout if there are any.

To test this out, we need a grammar and some test files. Create `grammar.pest`:

```
{{#include c_4_grammar.pest}}
```

```admonish info title="Pest Grammars"
You don't need to fully understand pest grammars to finish this example.
However, I will explain the basics of this grammar here.
Feel free to learn and experiment more if you are interested.
Grammars are [lists of rules](https://pest.rs/book/grammars/syntax.html#syntax-of-pest-grammars), such as `num` and `main`.
This grammar parses numbers with the `num` rule, matching 1 or more `ASCII_DIGIT` with [repetition](https://pest.rs/book/grammars/syntax.html#repetition).
The `main` rule ensures that there is no additional text before and after a `num` rule, using [`SOI` (start of input) `EOI` (end of input)](https://pest.rs/book/grammars/syntax.html#start-and-end-of-input), and using the [`~` operator to sequence](https://pest.rs/book/grammars/syntax.html#sequence) these rules.
We set the [`WHITESPACE` builtin rule](https://pest.rs/book/grammars/syntax.html#implicit-whitespace) to `{ " " | "\t" | "\n" | "\r" }` so that spaces, tabs, newlines, and carriage return characters are implicitly allowed between sequenced rules.
The `@` operator before `{` indicates that it is an [atomic rule](https://pest.rs/book/grammars/syntax.html#atomic), disallowing implicit whitespace.
We want this on the `num` rule so that we can't add spaces in between digits of a number (try removing it and see!)
The `_` operator before `{` indicates that it is a [silent rule](https://pest.rs/book/grammars/syntax.html#silent) that does not contribute to the parse result.
This is important when processing the parse result into an [Abstract Syntax Tree (AST)](https://pest.rs/book/examples/json.html#ast-generation).
In this example we just print the parse result, so silent rules are not really needed, but I included it for completeness.
```

Create `test_1.txt` with:

```
{{#include c_4_test_1.txt}}
```

And create `test_2.txt` with:

```
{{#include c_4_test_2.txt}}
```

Run the program with `cargo run --example parser_dev -- grammar.pest main test_1.txt test_2.txt`.
This should result in a build log showing that the grammar is successfully compiled, that one file is successfully parsed, and that one file has a parse error.

Unfortunately, there is no incrementality between different runs of the example, because the `Pie` `Store` is not persisted.
The `Store` only exists in-memory while the program is running, and is then thrown away.
Thus, there cannot be any incrementality.
To get incrementality, we need to serialize the `Store` before the program exits, and deserialize it when the program starts.
This is possible and not actually that hard, I just never got around to explaining it in this tutorial.
See the [Side Note: Serialization](#side-note-serialization) section at the end for info on how this can be implemented.

```admonish tip title="Hiding the Build Log"
If you are using a bash-like shell on a UNIX-like OS, you can hide the build log by redirecting stderr to `/dev/null` with: `cargo run --example parser_dev -- grammar.pest main test_1.txt test_2.txt 2>/dev/null`.
Otherwise, you can hide the build log by replacing `WritingTracker::with_stderr()` with `NoopTracker`.
```

Feel free to experiment a bit with the grammar, example files, etc. before continuing.
We will develop an interactive editor next however, which will make experimentation easier!
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,6 @@ pest = "2"
pest_meta = "2"
pest_vm = "2"
clap = { version = "4", features = ["derive"] }
ratatui = "0.24"
ratatui = "0.25"
tui-textarea = "0.4"
crossterm = "0.27"
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Loading

0 comments on commit a252407

Please sign in to comment.