ts-parsec is a parser combinator library prepared for typescript. By using this library, you are able to create parsers very quickly using just a few lines of code. It provides the following features:
- Tokenizer based on regular expressions. This tokenizer is designed for convenience. For some cases its performance may be unsatisfying. In this case, you could write your own tokenizer. It is very easy to plug your tokenizer into ts-parsec.
- Parser combinators.
- The ability to support recursive syntax.
You are recommended to learn EBNF before using this library.
typescript-parsec does not force you to create ASTs for their input. Instead, it gives you a data structure that looks similar to the grammar. For example, given the grammar:
A B (C|D) {E F, ...} G
By using this library, you could write the parser using combinators seq
, alt
and list_sc
in this way:
const myParser = seq(A, B, alt(C, D), list_sc(seq(E, F), str(',')), G);
Usually, if you are very sure that the input should not have ambiguity, and you want to parse the whole input instead of just a prefix, you could use utility functions to make your life easier:
const output = expectSingleResult(expectEOF(myParser.parse(myTokenizer.parse(`INPUT`))));
If parser A
converts inputs to values of type TA
, then the type of variable output
is [TA, TB, (TC | TD), [TE, TF][], TG]
.
It is very obvious that, alt(C, D)
gives you TC | TD
, because either C
or D
will success, and you will get TC
or TD
in different moments.
list_sc(seq(E, F), str(','))
gives you seq(E, F)
as many as possible, so you gete [TE, TF][]
.
The whole parser assume that the input has 5 parts: A
, B
, alt(C, D)
, list_sc(seq(E, F), str(','))
and G
,
so you get [TA, TB, (TC | TD), [TE, TF][], TG]
.
In most of the cases, you want to convert data structures to your own data structures, you could write:
function convertToSomething(value: [TA, TB, (TC | TD), [TE, TF][], TG]): Something {
...
}
const myParser =
apply(
seq(A, B, alt(C, D), list_sc(seq(E, F), str(',')), G),
convertToSomething
);
const output = expectSingleResult(expectEOF(myParser.parse(myTokenizer.parse(`INPUT`))));
Now output
becomes Something
.
When you have a complex grammar, you are recommended to use apply
everywhere, to convert parser outputs to your own data structures during parsing. Here are two examples. A simple calculator
reads 1+2
and writes 3
, calculations are done during parsing. A minimum Flow parser
convers a Flow program (not all features are supported) to an abstract data structure (AST), to allow different calculations to apply to the Flow program by reading the AST in different ways.
- Tokenizer: Building a tokenizer using regular expressions.
- Parser Combinators: Introduction to Parser Combinators.
- Utilities
expectEOF
looks into all results that returned at the same time, and pick all that consumes all tokens. If there is any result that fail to consume all tokens, an error will be generated as a hint.expectSingleResult
will see if there is only one result in the result list. If not, aTokenError
is thrown.TokenRangeError
is similar toTokenError
but it has a position range. This error will not be thrown by this library, it is reserved for your convenience.extractByPositionRange
andextractByTokenRange<T>
, select a range of text by a position range, and an input text that is required to be exactly the same as the string that is parsed. No checking will be done in these functions.