A modern, type-safe lexer and parser library for Scala 3, featuring compile-time validation and elegant DSL syntax.
- 🔍 Type-safe lexer and parser - Catch errors at compile time with Scala 3's powerful type system
- 🎯 Elegant DSL - Define lexers and parsers using intuitive pattern matching syntax
- ⚡ Compile-time validation - Regex patterns and grammar rules are validated during compilation
- 🧪 Macro-based - Leverages Scala 3 macros for efficient code generation
- 📚 Context-aware - Support for lexical and parsing contexts with type-safe state management
- 🛠️ LR Parsing - Uses LR parsing algorithm with automatic parse table generation
Add Alpaca as a dependency in your build.mill:
//| mill-version: 1.0.6
//| mill-jvm-version: 21
import mill._
import mill.scalalib._
object myproject extends ScalaModule {
def scalaVersion = "3.7.4"
def mvnDeps = Seq(
mvn"io.github.halotukozak::alpaca:0.0.2"
)
}
Add Alpaca to your build.sbt:
libraryDependencies += "io.github.halotukozak" %% "alpaca" % "0.0.2"Make sure you're using Scala 3.7.4 or later:
scalaVersion := "3.7.4"Use Alpaca directly in your Scala CLI scripts:
//> using scala "3.7.4"
//> using dep "io.github.halotukozak::alpaca:0.0.2"
import alpaca.*
// Your code hereDefine a lexer using pattern matching with regex patterns:
import alpaca.*
val MyLexer = lexer:
case num @ "[0-9]+" => Token["NUM"](num.toDouble)
case "\\+" => Token["PLUS"]
case "-" => Token["MINUS"]
case "\\*" => Token["STAR"]
case "/" => Token["SLASH"]
case "\\(" => Token["LP"]
case "\\)" => Token["RP"]
case "\\s+" => Token.IgnoredDefine a parser by extending the Parser class and defining grammar rules:
import alpaca.*
object MyParser extends Parser:
val root: Rule[Double] = rule { case Expr(e) => e }
val Expr: Rule[Double] = rule(
{ case (Expr(l), MyLexer.PLUS(_), Term(r)) => l + r },
{ case (Expr(l), MyLexer.MINUS(_), Term(r)) => l - r },
{ case Term(t) => t }
)
val Term: Rule[Double] = rule(
{ case (Term(l), MyLexer.STAR(_), Factor(r)) => l * r },
{ case (Term(l), MyLexer.SLASH(_), Factor(r)) => l / r },
{ case Factor(f) => f }
)
val Factor: Rule[Double] = rule(
{ case MyLexer.NUM(n) => n.value },
{ case (MyLexer.LP(_), Expr(e), MyLexer.RP(_)) => e }
)val input = "2 + 3 * 4"
val (_, lexemes) = MyLexer.tokenize(input)
val (_, result) = MyParser.parse(lexemes)
println(result) // 14.0alpaca/
├── src/alpaca/
│ ├── internal/ # Internal implementation
│ │ ├── lexer/ # Lexer internals (Token, Lexem, Tokenization, etc.)
│ │ ├── parser/ # Parser internals (ParseTable, State, Item, etc.)
│ │ ├── Empty.scala # Empty type class utilities
│ │ ├── Copyable.scala # Copyable type class
│ │ ├── Showable.scala # Showable type class for debugging
│ │ └── ... # Other core utilities
│ ├── lexer.scala # Public lexer DSL and API
│ ├── parser.scala # Public parser DSL and API
│ └── local.scala # Local utilities
├── test/src/alpaca/ # Test suite
├── example/ # Example projects
├── docs/ # Documentation
└── build.mill # Mill build configuration
Alpaca supports context-aware lexing and parsing, allowing you to maintain state during tokenization and parsing. Here's an example that tracks brace matching:
import alpaca.*
import scala.collection.mutable.Stack
case class BraceContext(
var text: CharSequence = "",
val braces: Stack[Char] = Stack()
) extends LexerCtx
val braceLexer = lexer[BraceContext]:
case "\\(" =>
ctx.braces.push('(')
Token["LPAREN"]
case "\\)" =>
if ctx.braces.isEmpty || ctx.braces.pop() != '(' then
throw RuntimeException("Mismatched parenthesis")
Token["RPAREN"]
case "\\{" =>
ctx.braces.push('{')
Token["LBRACE"]
case "\\}" =>
if ctx.braces.isEmpty || ctx.braces.pop() != '{' then
throw RuntimeException("Mismatched brace")
Token["RBRACE"]
case "\\s+" => Token.Ignored
case "[a-zA-Z]+" => Token["ID"]
// Usage
val input = "{ foo ( bar ) }"
val (finalCtx, lexemes) = braceLexer.tokenize(input)
if finalCtx.braces.nonEmpty then
throw RuntimeException("Unclosed braces: " + finalCtx.braces.mkString)Tokens can carry values extracted from the input:
case num @ "[0-9]+" => Token["NUM"](num.toInt)
case id @ "[a-zA-Z][a-zA-Z0-9]*" => Token["ID"](id)Use Token.Ignored for whitespace and comments that should be skipped:
case "\\s+" => Token.Ignored
case "#.*" => Token.Ignored - JDK 21 or later
- Mill 1.0.6 or later
# Compile the project
./mill compile
# Run tests
./mill test
# Generate documentation
./mill docJar
# Run test coverage
./mill test.scoverage.htmlReport- 📖 Full Documentation
- 🚀 Getting Started - Installation, quick start, and first steps
- 🔤 Lexer - Token definitions, regex patterns, and tokenization
- 🗂️ Lexer Context - Stateful lexing with LexerCtx and tracking traits
⚠️ Lexer Error Recovery - ShadowException, pattern ordering, and error handling- 🔀 Between Stages - Lexeme structure and the lexer-to-parser data pipeline
- 📐 Parser - Grammar rules, EBNF operators, and parsing tokenized input
- 🧩 Parser Context - Custom ParserCtx, shared state across reductions, and the parse() return value
- ⚖️ Conflict Resolution - Shift/reduce and reduce/reduce conflicts, the before/after DSL, and named productions
- 🎯 Extractors - Terminal and non-terminal matching, EBNF extractors, and Lexeme field access
- 🐛 Debug Settings - Configure compile-time debugging and logging
This project was developed as a Bachelor's Thesis. The full text of the thesis is available in
the thesis.pdf file. The LaTeX source files can be found
on the thesis branch. Note that the thesis is written in Polish
and does not represent the current state of the project.
Contributions are welcome! Please feel free to submit a Pull Request.
Created by halotukozak and Corvette653
Made with ❤️ and coffee