Skip to content

v2-io/libudon

Repository files navigation

libudon

High-performance UDON parser in Rust.

What is UDON?

UDON (Universal Document & Object Notation) is a unified notation for documents, data, and configuration. See the specification for details.

Status

Phase 3 complete - Parser rewritten using descent, a parser generator that produces callback-based recursive descent parsers.

Features

Feature Syntax Status
Elements |element[id].class1.class2?
Attributes :key value
Typed values integers, floats, rationals, complex, booleans, nil, strings
Arrays [1 2 3]
Text/prose indented content
Hierarchy indentation and rightward nesting
Comments ; line and ;{inline}
Embedded elements |{em bold text}
Interpolation !{{expression}}
Block directives !if condition
Inline directives !{include partial}
Raw blocks !:lang:
References @[name]
Freeform blocks ```

Performance

Benchmarked on Apple Silicon (M-series):

Benchmark Time Throughput
Comprehensive (15KB) 10.5 µs 1.35 GiB/s
Comments 47 ns 726 MiB/s
Minimal file 74 ns 696 MiB/s
Text content 58 ns 672 MiB/s
Dynamic content 86 ns 455 MiB/s
Nested elements 185 ns 336 MiB/s
Empty (overhead) 1.9 ns

Comparison with Previous Parser (main branch)

Benchmark Old (main) New (phase-3) Speedup
comprehensive.udon 28 µs (516 MiB/s) 10.5 µs (1.35 GiB/s) 2.7x
minimal.udon 106 ns (486 MiB/s) 74 ns (696 MiB/s) 1.4x
comments_only 123 ns (278 MiB/s) 47 ns (726 MiB/s) 2.6x
text_only 122 ns (318 MiB/s) 58 ns (672 MiB/s) 2.1x
empty (overhead) 80 ns 1.9 ns 42x

Cross-Format Comparison

Parsing semantically equivalent documents (~50% structure, ~30% short text, ~20% prose):

Format Parser s10 (MB/s) s10 (El/s) s50 (MB/s) s50 (El/s) s200 (MB/s) s200 (El/s) Size
UDON libudon 897 9.4M 744 7.7M 748 7.7M 100%
XML quick-xml 935 7.6M 983 7.9M 1,003 8.0M 129%
JSON serde_json 353 3.4M 372 3.6M 335 3.2M 108%
Markdown pulldown-cmark 199 2.2M 196 2.1M 207 2.2M 98%
TOML toml 54 0.5M 56 0.5M 55 0.5M 122%
YAML serde_yaml 41 0.3M 43 0.4M 43 0.4M 126%
  • s10/s50/s200: 10, 50, 200 item documents (22, 101, 401 elements)
  • MB/s: Raw byte throughput
  • El/s: Semantic elements parsed per second
  • Size: Average document size relative to UDON

Run with: cargo bench --bench compare

Structure

libudon/
├── udon-core/           # Core parser library
│   └── src/
│       ├── lib.rs       # Public API
│       ├── parser.rs    # GENERATED by descent
│       ├── tree.rs      # Tree/AST representation
│       └── span.rs      # Source locations
├── generator/           # Parser specification
│   ├── udon.desc        # Main parser grammar
│   └── values.desc      # Value type parsing
└── regenerate-parser    # Script to regenerate parser

Building

cargo build --release

Testing

cargo test

Benchmarking

cargo bench --bench parse

Regenerating the Parser

The parser is generated from .desc specifications using descent:

# Install descent (from ~/src/descent/)
cd ~/src/descent && dx gem install

# Regenerate parser
./regenerate-parser

# Regenerate with tracing (for debugging)
./regenerate-parser --trace

Usage

libudon provides two APIs:

Tree API (DOM-like)

Parse into a navigable tree structure:

use udon_core::tree::Document;

let input = b"|article[intro].featured
  :author Joseph
  :tags [rust parsing udon]

  |heading Welcome

  Some introductory text.
";

let doc = Document::parse(input).unwrap();

// Navigate the tree
for node in doc.root().children() {
    if let Some(el) = node.as_element() {
        println!("Element: {}", el.name());
        println!("  id: {:?}", el.id());
        println!("  classes: {:?}", el.classes());

        for (name, value) in el.attrs() {
            println!("  :{} = {:?}", name, value);
        }
    }
}

Output:

Element: article
  id: Some("intro")
  classes: ["featured"]
  :author = Bare("Joseph")
  :tags = Array([Bare("rust"), Bare("parsing"), Bare("udon")])

Tree API features:

  • Full parent/child/sibling navigation
  • ElementView for typed access to element properties
  • Value enum preserves original representation (Integer, Float, Rational, Complex, Bool, Nil, Array)
  • Zero-copy where possible via Cow<str>
  • ~313 MB/s throughput (2.6x overhead vs streaming)

Streaming API (SAX-like)

For maximum performance or large documents:

use udon_core::Parser;

let input = b"|article[intro]\n  :author Joseph\n  Hello, world!\n";

Parser::new(input).parse(|event| {
    println!("{}", event.format_line());
});

Output:

ElementStart @ 1..1
Name "article" @ 1..8
Attr "id" @ 8..15
BareValue "intro" @ 9..14
Attr "author" @ 19..25
BareValue "Joseph" @ 27..33
Text "Hello, world!" @ 36..49
ElementEnd @ 50..50

Streaming API features:

  • Zero allocations during parse
  • ~800 MB/s throughput
  • Callback-based event delivery
  • Ideal for large documents or when you only need specific elements

Related Repositories

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •