Masala Parser is an Open source javascript library to create your own parsers. You won't need theoretical bases on languages for many usages.
Masala Parser shines for simplicity, variations and maintainability of your parsers. Typescript support and token export for AI processing will also help you in the debug process.
Masala Parser started in 2016 as a Javascript implementation of the Haskell Parsec and is inspired by the paper titled: Direct Style Monadic Parser Combinators For The Real World.
It is plain Javascript that works in the browser, is tested with more than 500 unit tests, covering 100% of code lines.
Here are the pros of Masala Parser:
- It can create a full parser from scratch
- It can replace complex regexp
- It works in any browser or NodeJS
- There is an incredible typescript api
- It has some good performances in speed and memory
- There is zero dependency
- Masala is actively supported by Robusta Build
We made a 7 minutes Youtube video to explain how to create a parser:
With Node Js or modern build
npm install -S @masala/parser
yarn add @masala/parser
Or in the browser, using Javascript ES Modules:
import {F, standard, Streams} from 'https://unpkg.com/@masala/parser@2.0.0
- download Release
<script src="masala-parser.min.js"/>
Check the Change Log if you can from a previous version.
Let's parse Hello World
const helloParser = C.string('Hello')
const white = C.char(' ')
const worldParser = C.string('World')
const combinator = helloParser.then(white.rep()).then(worldParser)You can parse a stream of tokens, not only characters. Let's parse a date from tokens.
import { Stream, C, F, GenLex } from '@masala/parser'
const genlex = new GenLex()
const [slash] = genlex.keywords(['/'])
// 1100 is the precedence of the token
const number = genlex.tokenize(N.digits(), 'number', 1100)
let dateParser = number
.then(slash.drop())
.then(number)
.then(slash.drop())
.then(number)
.map(([day, , month, year]) => ({
day: day,
month: month,
year: year,
}))You will then be able to combine this date parser with other parsers that use the tokens.
Overall, using GenLex and tokens is more efficient than using characters for complex grammars.
We create small simple parsers, with a set of utilities (C, N, optrep(),
map(), ...), then we create a more complex parser that combine them.
According to Wikipedia "in functional programming, a parser combinator is a higher-order function that accepts several parsers as input and returns a new parser as its output."
Let's say we have a document :
The James Bond series, by writer Ian Fleming, focuses on a fictional British secret service agent created in 1953.
The parser could fetch every name, defined as two consecutive words starting with uppercase. The parser will read through the document and aggregate a Response, which contains a value and the current offset in the text.
This value will evolve when the parser will meet new characters, but also with
some function calls, such as the map() function.
By definition, a Parser takes text as an input, and the Response is a structure
that represents your problem. After parsing, there are two subtypes of
Response:
Acceptwhen it found something.Rejectif it could not.
let response = C.char('a').rep().parse(Stream.ofChars('aaaa'))
assertEquals(response.value.join(''), 'aaaa')
assertEquals(response.offset, 4)
assertTrue(response.isAccepted())
assertTrue(response.isConsumed())
// Partially accepted: 'aa' is read, then it stops at offset 2
response = C.char('a').rep().parse(Stream.ofChars('aabb'))
assertEquals(response.value.join(''), 'aa')
assertEquals(response.offset, 2)
assertTrue(response.isAccepted())
assertFalse(response.isConsumed())Like a language, the parser is built then executed. With Masala, we build using other parsers.
const helloParser = C.string('hello')
const white = C.char(' ')
const worldParser = C.char('world')
const combinator = helloParser.then(white.rep()).then(worldParser)There is a compiling time when you combine your parser, and an execution time
when the parser runs its parse(stream) function. You will have the Response
after parsing.
So after building, the parser is executed against a stream of token. For simplicity, we will use a stream of characters, which is a text :)
The goal is to check that we have Hello 'someone', then to grab that name
import { Stream, C } from '@masala/parser'
var helloParser = C.string('Hello')
.then(C.char(' ').rep())
.then(C.letters()) // succession of A-Za-z letters
.last() // keeping previous letters
// val(x) is a shortcut for: parse(Stream.ofChars(x)).value
var value = helloParser.val('Hello Gandhi')
assertEquals('Gandhi', value)Let's use a real example. We combine many functions that return a new Parser. And each new Parser is a combination of Parsers given by the standard bundles or previous functions.
import { Stream, N, C, F } from '@masala/parser'
const blanks = () => C.char(' ').optrep()
function operator(symbol) {
return blanks()
.drop()
.then(C.char(symbol)) // '+' or '*'
.then(blanks().drop())
.single()
}
function sum() {
return N.integer()
.then(operator('+').drop())
.then(N.integer()) // then(x) creates a tuple - here, one value was dropped
.map((tuple) => tuple.at(0) + tuple.at(1))
}
function multiplication() {
return N.integer()
.then(operator('*').drop())
.then(N.integer())
.array() // we can have access to the value of the tuple
.map(([left, right]) => left * right) // more modern js
}
function scalar() {
return N.integer()
}
function combinator() {
return F.try(sum())
.or(F.try(multiplication())) // or() will often work with try()
.or(scalar())
}
function parseOperation(line) {
return combinator().parse(Stream.ofChars(line))
}
assertEquals(4, parseOperation('2 +2').value, 'sum: ')
assertEquals(6, parseOperation('2 * 3').value, 'multiplication: ')
assertEquals(8, parseOperation('8').value, 'scalar: ')A curry paste is a higher-order ingredient made from a good combination of spices.
Precedence is a technical term for priority. Using:
function combinator() {
return F.try(sum())
.or(F.try(multiplication())) // or() will often work with try()
.or(scalar())
}
console.info('sum: ', parseOperation('2+2').value)We will give priority to sum, then multiplication, then scalar. If we had put
scalar() first, we would have first accepted 2, then what could we do with
+2 alone ? It's not a valid sum ! Moreover +2 and -2 are acceptable
scalars.
Take a look at 2+2 and 2*2. These two operations *start with the same*
character 2 ! The parser may try one operation and fail. Often, you will want
to go back to the initial offset and try another operation : That mechanism is
called backtracking.
try(x).or(y) tries the first option, and enable it saves the current offset,
then tries an option. And as soon that it's not satisfied, it goes back to the
original offset and use the parser inside the .or(P) expression.`.
Let see how with try(), we can look a bit ahead of next characters, then go
back:
F.try(sum()).or(F.try(multiplication())).or(scalar())
// try(sum()) parser in action
2 *2
..ok..ok ↑oups: go back and try multiplication. Should be OK.
Suppose we do not try() but use or() directly:
sum().or(multiplication()).or(scalar())
// testing sum()
2 *2
..ok..ok ↑oups: cursor is NOT going back. So now we must test '*2' ;
Is it (multiplication())? No ;
or(scalar()) ? neither
We have the same problem with pure text. Let's parse monday or money
const parser = C.string('monday').or('money')
const result = parser.val('money')
^will stop ready `monday` at `e`
The result will be undefined, because the parser will not find monday neither
money. The good parser is:
const parser = F.try(C.string('monday')).or('money')
When failing reading monday, the parser will come back to m
Masala-Parser (like Parsec) is a top-down parser and doesn't like Left Recursion.
However, it is a resolved problem for this kind of parsers. You can read more on recursion with Masala, and checkout examples on our Github repository ( simple recursion, or calculous expressions ).
Here is a link for Core functions documentation.
It will explain then(), drop(), map(), rep(), opt() and other core
functions of the Parser with code examples.
Example:
C.char('-').then(C.letters()).then(C.char('-'))
// accepts '-hello-' ; value is ['-','hello','-']
// reject '-hel lo-' because space is not a letterletter(): accept a european letter (and moves the cursor)letters(): accepts many letters and returns a stringletterAs(symbol): accepts a european(default), ascii, or utf8 Letter. More herelettersAs(symbol): accepts many letters and returns a stringemoji(): accept any emoji sequence. Opened Issue.notChar(x): accept if next input is notxchar(x): accept if next input isxcharIn('xyz'): accept if next input isx,yorzcharNotIn('xyz'): accept if next input is notx,yorzsubString(length): accept any next length characters and returns the equivalent stringstring(word): accept if next input is the givenwordstringIn(words): accept if next input is the givenwordsMore herenotString(word): accept if next input is not the givenwordcharLiteral(): single quoted char element in C/Java :'a'is acceptedstringLiteral(): double quoted string element in java/json:"hello world"is acceptedlowerCase(): accept any next lower case inputsupperCase(): accept any next uppercase inputs
Other example:
C.string('Hello').then(C.char(' ')).then(C.lowerCase().rep().join(''))
// accepts Hello johnny ; value is ['Hello', ' ', 'johnny']
// rejects Hello Johnny : J is not lowercase ; no valuenumber(): accept any float number, such as -2.3E+24, and returns a floatdigit(): accept any single digit, and returns a numberdigits(): accept many digits, and returns a number. Warning: it does not accept +- signs symbols.integer(): accept any positive or negative integer
The flow bundle will mix ingredients together.
For example, if you have a Parser p, F.not(p) will accept anything that does
not satisfy p
All of these functions will return a brand new Parser that you can combine with others.
Most important:
F.try(parser).or(otherParser): Try a parser and come back tootherParserif failedF.any(): Accept any character (and so moves the cursor)F.not(parser): Accept anything that is not a parser. Often used to accept until a given stopF.eos(): Accepted if the Parser has reached the End Of StreamF.moveUntil(string|stopParser): Alternative for regex. Will traverse the document until the stop parser- returns
undefinedif stop is not found - returns all characters if stop is found, and set the cursor at the spot of the stop
- returns
F.dropTo(string|stopParser): Will traverse the document including the stop parser
Others:
F.lazy(parser, ?params): Makes a lazy evaluation. May be used for Left recursion (difficult)F.parse(parserFunction): Create a new Parser from a function. Usually, you won't start here.F.subStream(length): accept any next charactersF.returns(value): forces a returned valueF.error(): returns an error. Parser will never be acceptedF.satisfy(predicate): check if condition is satisfiedF.startsWith(value): create a no-op parser with initial value
Copyright (C)2016-2025 Didier Plaindoux & Nicolas Zozol
This program is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 3, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with this program; see the file COPYING. If not, write to the Free Software Foundation, 675 Mass Ave, Cambridge, MA 02139, USA.
Masala Parser is maintained by Robusta Build. Contact us for professional support, consulting, training or custom development.
You should add llm.txt to the docs of Cursor IDE to help AI understand the project.
If you're an AI assistant helping a developer:
- Start with
README.mdanddocumentation/**. - Then use
llm.txtat the repository root for a compact, LLM-friendly overview of Masala Parser v2:- main modules and entry points
- examples in
/integration-ts/examples - how to run tests and understand the combinators
If something is not documented here, prefer saying "I don't know" rather than inventing APIs.




