High-performance UDON parser in Rust.
UDON (Universal Document & Object Notation) is a unified notation for documents, data, and configuration. See the specification for details.
Phase 3 complete - Parser rewritten using descent, a parser generator that produces callback-based recursive descent parsers.
| Feature | Syntax | Status |
|---|---|---|
| Elements | |element[id].class1.class2? |
✅ |
| Attributes | :key value |
✅ |
| Typed values | integers, floats, rationals, complex, booleans, nil, strings | ✅ |
| Arrays | [1 2 3] |
✅ |
| Text/prose | indented content | ✅ |
| Hierarchy | indentation and rightward nesting | ✅ |
| Comments | ; line and ;{inline} |
✅ |
| Embedded elements | |{em bold text} |
✅ |
| Interpolation | !{{expression}} |
✅ |
| Block directives | !if condition |
✅ |
| Inline directives | !{include partial} |
✅ |
| Raw blocks | !:lang: |
✅ |
| References | @[name] |
✅ |
| Freeform blocks | ``` |
✅ |
Benchmarked on Apple Silicon (M-series):
| Benchmark | Time | Throughput |
|---|---|---|
| Comprehensive (15KB) | 10.5 µs | 1.35 GiB/s |
| Comments | 47 ns | 726 MiB/s |
| Minimal file | 74 ns | 696 MiB/s |
| Text content | 58 ns | 672 MiB/s |
| Dynamic content | 86 ns | 455 MiB/s |
| Nested elements | 185 ns | 336 MiB/s |
| Empty (overhead) | 1.9 ns | — |
| Benchmark | Old (main) | New (phase-3) | Speedup |
|---|---|---|---|
| comprehensive.udon | 28 µs (516 MiB/s) | 10.5 µs (1.35 GiB/s) | 2.7x |
| minimal.udon | 106 ns (486 MiB/s) | 74 ns (696 MiB/s) | 1.4x |
| comments_only | 123 ns (278 MiB/s) | 47 ns (726 MiB/s) | 2.6x |
| text_only | 122 ns (318 MiB/s) | 58 ns (672 MiB/s) | 2.1x |
| empty (overhead) | 80 ns | 1.9 ns | 42x |
Parsing semantically equivalent documents (~50% structure, ~30% short text, ~20% prose):
| Format | Parser | s10 (MB/s) | s10 (El/s) | s50 (MB/s) | s50 (El/s) | s200 (MB/s) | s200 (El/s) | Size |
|---|---|---|---|---|---|---|---|---|
| UDON | libudon | 897 | 9.4M | 744 | 7.7M | 748 | 7.7M | 100% |
| XML | quick-xml | 935 | 7.6M | 983 | 7.9M | 1,003 | 8.0M | 129% |
| JSON | serde_json | 353 | 3.4M | 372 | 3.6M | 335 | 3.2M | 108% |
| Markdown | pulldown-cmark | 199 | 2.2M | 196 | 2.1M | 207 | 2.2M | 98% |
| TOML | toml | 54 | 0.5M | 56 | 0.5M | 55 | 0.5M | 122% |
| YAML | serde_yaml | 41 | 0.3M | 43 | 0.4M | 43 | 0.4M | 126% |
- s10/s50/s200: 10, 50, 200 item documents (22, 101, 401 elements)
- MB/s: Raw byte throughput
- El/s: Semantic elements parsed per second
- Size: Average document size relative to UDON
Run with: cargo bench --bench compare
libudon/
├── udon-core/ # Core parser library
│ └── src/
│ ├── lib.rs # Public API
│ ├── parser.rs # GENERATED by descent
│ ├── tree.rs # Tree/AST representation
│ └── span.rs # Source locations
├── generator/ # Parser specification
│ ├── udon.desc # Main parser grammar
│ └── values.desc # Value type parsing
└── regenerate-parser # Script to regenerate parser
cargo build --releasecargo testcargo bench --bench parseThe parser is generated from .desc specifications using descent:
# Install descent (from ~/src/descent/)
cd ~/src/descent && dx gem install
# Regenerate parser
./regenerate-parser
# Regenerate with tracing (for debugging)
./regenerate-parser --tracelibudon provides two APIs:
Parse into a navigable tree structure:
use udon_core::tree::Document;
let input = b"|article[intro].featured
:author Joseph
:tags [rust parsing udon]
|heading Welcome
Some introductory text.
";
let doc = Document::parse(input).unwrap();
// Navigate the tree
for node in doc.root().children() {
if let Some(el) = node.as_element() {
println!("Element: {}", el.name());
println!(" id: {:?}", el.id());
println!(" classes: {:?}", el.classes());
for (name, value) in el.attrs() {
println!(" :{} = {:?}", name, value);
}
}
}Output:
Element: article
id: Some("intro")
classes: ["featured"]
:author = Bare("Joseph")
:tags = Array([Bare("rust"), Bare("parsing"), Bare("udon")])
Tree API features:
- Full parent/child/sibling navigation
ElementViewfor typed access to element propertiesValueenum preserves original representation (Integer, Float, Rational, Complex, Bool, Nil, Array)- Zero-copy where possible via
Cow<str> - ~313 MB/s throughput (2.6x overhead vs streaming)
For maximum performance or large documents:
use udon_core::Parser;
let input = b"|article[intro]\n :author Joseph\n Hello, world!\n";
Parser::new(input).parse(|event| {
println!("{}", event.format_line());
});Output:
ElementStart @ 1..1
Name "article" @ 1..8
Attr "id" @ 8..15
BareValue "intro" @ 9..14
Attr "author" @ 19..25
BareValue "Joseph" @ 27..33
Text "Hello, world!" @ 36..49
ElementEnd @ 50..50
Streaming API features:
- Zero allocations during parse
- ~800 MB/s throughput
- Callback-based event delivery
- Ideal for large documents or when you only need specific elements
MIT