You define a LL parser
by EBNF
, and then translate it using parser combinators. All parser combinators could return multiple results, except those ends with _sc
, which means short cut
.
Short cut combinators returns the best result, even if it could fail the whole parser.
You need to choose them wisely.
ts-parsec provides the following combinators.
When I say "P
returns T
", it doesn't mean the function P
returns a value of type T
, it means that returnValue.candidates[something].result
is T
.
If a parser combinator involves other parser combinators, like seq(a,b)
, then a
returns Ta
, b
returns Tb
, so seq(a,b)
returns [Ta, Tb]
.
In most of the cases, you don't need to deal with multiple results by yourself, this is why you want to call exceptSingleResult
and expectEOF
. They are explained here.
- Token Filters
- nil():
- Consumes no token.
- Returns
undefined
. - Never fails.
- succ(value):
- Consumes no token.
- Returns
value
. - Never fails.
- fail(errorMessage):
- Consumes no token.
- Always fails.
- str('x'):
- Consumes a token that is
'x'
. - Returns
Token<T>
. - Fails if the current token could not be consumed.
- Consumes a token that is
- tok(x):
- Consumes a token whose
kind
isx
. If you usebuildLexer
, these values ofx
is put in the 3rd place in each line. - Returns
Token<T>
. - Fails if the current token could not be consumed.
- Consumes a token whose
- nil():
- Sequential
- seq(a, b, c):
- Consumes tokens that matches
a
,b
and thenc
in order. You could put 2–16 arguments inseq
. - Returns
[Ta, Tb, Tc]
. - Fails if one of
a
,b
andc
fails.
- Consumes tokens that matches
- combine(a, (x)=>b, (x)=>c):
- Consumes tokens that matches
a
,b
and thenc
in order, when all parsers aftera
depend on the result of the previous parser. You could put 2-16 arguments incombine
. - Returns
Tc
. - Fails if one of
a
,b
andc
fails.
- Consumes tokens that matches
- kleft(a, b):
- Equivalent to
seq(a, b)
. - Returns
Ta
. - Fails if one of
a
andb
fails.
- Equivalent to
- kmid(a, b, c):
- Equivalent to
seq(a, b, c)
. - Returns
Tb
. - Fails if one of
a
,b
andc
fails.
- Equivalent to
- kright(a, b):
- Equivalent to
seq(a, b)
. - Returns
Tb
. - Fails if one of
a
andb
fails.
- Equivalent to
- seq(a, b, c):
- Alternative
- alt(a, b, c):
- Consumes tokens that matches
a
,b
orc
. You could put 2–16 arguments inalt
. - Returns
Ta | Tb | Tc
, with all succeeded attempts. - Fails if all of
a
,b
orc
fails.
- Consumes tokens that matches
- alt_sc(a, b, c):
- Consumes tokens that matches
a
,b
orc
. You could put 2–16 arguments inalt
. - Returns
Ta | Tb | Tc
, with the first succeeded attempt. - Fails if all of
a
,b
orc
fails.
- Consumes tokens that matches
- opt_sc(x):
- Consumes
x
. - Returns
Tx | undefined
. It returnsundefined
only ifx
fails. - Never fails.
- Consumes
- opt(x):
- Equivalent to
x | nil<T>()
. - returns
Tx | undefined
. - Never fails.
- Equivalent to
- alt(a, b, c):
- Repetitive
- rep_sc(x):
- Consumes as many
x
as possible. - Returns
Tx[]
. If nox
could be consumed, it returns[]
. - Never fails.
- Consumes as many
- rep(x):
- Consumes
x
zero to multiple times. - Returns
Tx[]
. If 3x
are consumed, it returns multiple results in the order like[x1, x2, x3]
,[x1, x2]
,[x1]
and[]
. - Never fails.
- Consumes
- repr(x):
- Consumes
x
zero to multiple times. - Returns
Tx[]
. If 3x
are consumed, it returns multiple results in the order like[]
,[x1]
,[x1, x2]
, and[x1, x2, x3]
. - Never fails.
- Consumes
- rep_n(x, n):
- Consumes
x
exactlyn
times. - Returns
Tx[]
. If 3x
are consumed, it returns[x1, x2, x3]
. - Fails if there is less than
n
times ofx
.
- Consumes
- list(x, d).
- Equivalent to
seq(x, rep(kright(d, x)))
. - Returns
Tx[]
, instead of[Tx, Tx[]]
. - Fails if
x
succeeds zero times.
- Equivalent to
- list_sc(x,d):
- Equivalent to
seq(x, rep_sc(kright(d, x)))
. - Returns
Tx[]
, instead of[Tx, Tx[]]
. - Fails if
x
succeeds zero times.
- Equivalent to
- list_n(x,d,n):
- Equivalent to
opt_sc(seq(x, rep_n(kright(d, x), n - 1)))
. - Returns
Tx[]
, instead of[Tx, Tx[]] | undefined
. - Fails if there is less than
n
times ofx
.
- Equivalent to
- rep_sc(x):
- Left Recursive
- lrec(a, b, f):
- Equivalent to
seq(a, rep(b))
. - Returns the result of
f(f(f(a, b1), b2), b3) ...
. If nob
succeeds, it returnsa
, which requiresTa
should be compatible with the return type off
. - Fails if
a
fails.
- Equivalent to
- lrec_sc(a,b,f):
- Equivalent to
seq(a, rep_sc(b)))
. - Returns the result of
f(f(f(a, b1), b2), b3) ...
. If nob
succeeds, it returnsa
, which requiresTa
should be compatible with the return type off
. - Fails if
a
fails.
- Equivalent to
- lrec(a, b, f):
- Ambiguity Resolving
- amb(x):
- Consumes x and merge group result by consumed tokens.
- Returns
Tx[]
. - Fails if
x
fails.
- amb(x):
- Error Recovering
- err(x, e):
- Consumes
x
. - When
x
fails, the error message is replaced by e.
- Consumes
- errd(x, e, v):
- Consumes
x
. - When
x
fails, the error message is replaced by e. - Returns
Tx
. It returnsv
only ifx
fails. - Never fails.
- Consumes
- err(x, e):
- Others
- apply(x, f):
- Equivalent to
x
. - Returns the result of function
f
. The function will be called against the result ofx
, ifx
succeeds. - Fails if
x
fails.
- Equivalent to
- apply(x, f):
Ambiguity is very common and we may not be able to design a syntax without context-free ambiguity (e.g. the C programming language).
amb
allows you to limit the ambiguity inside an AST, and you will be able to resolve it after parsing. You don't need to mix parsing and semantic analyzing like traditional C compilers.
Recursive syntax is also very common. Sometimes you need to parse a tree, not just a list, so lrec
, lrec_sc
and rule
is your friend.
You are recommended to read A simple calculator if you have never learnt about EBNF
.
lrec
and lrec_sc
means Left Recursive (with Short Cut). They work in the same way, except that lrec_sc
returns a single result which consumes tokens as much as possible, lrec
returns multiple possible results.
Left recursive is a special kind of recursive, usually it appears in left-associated binary operator. In test/TestRecursiveParser.ts
, there are TERM
, FACTOR
and EXP
.
TERM
is very straight forward. It parses 1
, +1
, -1
, or (something)
.
FACTOR
is a left recursive parser. It consumes 1
, 1*2
or 1*2*3
. For 1*2*3...
, it is very clear that the structure is 1
, *2
, *3
... . And let's look at how FACTOR
is implemented:
lrec_sc(
TERM,
seq(
alt(str('*'), str('/')),
TERM
),
applyBinary
)
According to the description above, lrec_sc(a,b,f)
is apply(seq(a, rep_sc(b)), F(f))
. So FACTOR
will first need to parse 1
, and then it parses *2
, *3
, until it fails.
At this moment, you get a result like ['1', ['*2', '*3']]
. And then f
is called in this way: f(f('1', '*2'), '*3')
. This is what F(f)
does here.
It is very cleared that, the way it calls f
is left-associated, so this is a left recursive parser.
There is no need to have a right recursive parser function, because you could just call this parser inside itself, by using rule
, which will be explained later.
Splitting +
, -
and *
, /
to EXP
and FACTOR
and letting EXP
call FACTOR
is a common trick to specify operator precedence. If you need to parse C++, which has about 20 levels of operator precedence, you will need to have 20 EXP
s calling each other.
EXP
calls FACTOR
, FACTOR
calls TERM
, and TERM
finally calls EXP
again. This is recursive. You are not able to do this in one parser combinator expression. This is why you need rule
. rule
returns a parser, you can set another parser to it later by calling the setPattern
member function.
When defining TERM
, EXP
is not yet defined, no problem! When EXP.setPattern
is called, TERM
will know `EXP`` is updated immediately.