Replies: 4 comments 4 replies
-
I tried memoization like this: val cache_minus = mutable.Map[Int, P[Int]]()
val cache_plus = mutable.Map[Int, P[Int]]()
def x_minus[$](implicit p: P[$]): P[Int] = cache_minus.getOrElseUpdate(p.index, P(x_times ~ "-" ~ expr).map { case (x, y) => x - y })
def x_plus[$](implicit p: P[$]): P[Int] = cache_plus.getOrElseUpdate(p.index, P(x_times ~ ("+" ~ expr).rep).map { case (i, is) => i + is.sum })
// Other grammar rules remain unchanged. This fails whenever the input contains parentheses: parse("123*(1+1)", program(_))
//
java.lang.ClassCastException: class scala.runtime.BoxedUnit cannot be cast to class java.lang.Integer (scala.runtime.BoxedUnit is in unnamed module of loader 'app'; java.lang.Integer is in module java.base of loader 'bootstrap')
at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:99) But I was unable to find the code that does this unboxing to |
Beta Was this translation helpful? Give feedback.
-
The basic problem appears to be here def expr[$: P]: P[Int] = P(x_minus | x_plus)
def x_minus[$: P] = P(x_times ~ "-" ~ expr).map { case (x, y) => x - y }
def x_plus[$: P] = P(x_times ~ ("+" ~ expr).rep).map { case (i, is) => i + is.sum } You are making You probably want something like this def expr[$: P]: P[Int] = P(x_times ~ (("+" | "-").! ~ expr).rep).map {
case (i, is) => i + is.map{case ("+", n) => n case ("-", n) => -n}.sum
} Once done, it is no longer exponential. You can also now add cuts to enforce this and improve the error reporting: @ time{ import fastparse.NoWhitespace._
import fastparse._
// Integer calculator program: 1+2*3-(4-5)*6 and so on. No spaces, for simplicity.
def program[$: P]: P[Int] = P(expr ~ End)
def expr[$: P]: P[Int] = P(x_times ~ (("+" | "-").! ~/ expr).rep).map {
case (i, is) => i + is.map{case ("+", n) => n case ("-", n) => -n}.sum
}
def x_times[$: P] = P(x_other ~ ("*" ~/ x_other).rep).map { case (i, is) => i * is.product }
def x_other[$: P] = P(number | ("(" ~/ expr ~ ")"))
def number[$: P] = P(CharIn("0-9").rep(1)).!.map(_.toInt)
// Verify that this works as expected.
assert(parse("123*(1+1)", program(_)).get.value == 246)
assert(parse("123*1+1", program(_)).get.value == 124)
assert(parse("123*1-1", program(_)).get.value == 122)
assert(parse("123*(1-1)", program(_)).get.value == 0)
// Parse an expression of the form `(((((...(1)...)))))`.
//val n = 25
assert(parse("(" * (n - 1) + "1" + ")" * (n - 1), program(_)).get.value == 1)
}
res30: (Unit, concurrent.duration.FiniteDuration) = ((), 1929708 nanoseconds) |
Beta Was this translation helpful? Give feedback.
-
Thank you for the explanation. After some trial and error, I came up with some memoization code that seems to work. As you said, the problem with the toy grammar in this example is that //... other rules of the grammar as above. The changes are only for `x_times` and `x_other`:
def x_times[$: P]: P[R] = P(x_other_cached ~ ("*" ~ x_other_cached).rep).map { case (i, is) => i * is.product }
def x_other[$: P]: P[R] = P(number | ("(" ~ expr ~ ")"))
def x_other_cached[$](implicit p: P[$]): P[R] = cachedParser(cache_other, x_other) To make this work, I used the following setup: final case class PRunData( // Copy all the mutable data from ParsingRun.
terminalMsgs: Msgs,
aggregateMsgs: Msgs,
shortMsg: Msgs,
lastFailureMsg: Msgs,
failureStack: List[(String, Int)],
isSuccess: Boolean,
logDepth: Int,
index: Int,
cut: Boolean,
successValue: Any,
verboseFailures: Boolean,
noDropBuffer: Boolean,
misc: collection.mutable.Map[Any, Any],
) {
override def toString: String = {
s"ParsingRun(index=$index, isSuccess = $isSuccess, successValue = $successValue)"
}
def assignToParsingRun[T](pr: ParsingRun[T]): ParsingRun[T] = { // Assign the mutable data to a given ParsingRun value.
pr.terminalMsgs = terminalMsgs
pr.aggregateMsgs = aggregateMsgs
pr.shortMsg = shortMsg
pr.lastFailureMsg = lastFailureMsg
pr.failureStack = failureStack
pr.isSuccess = isSuccess
pr.logDepth = logDepth
pr.index = index
pr.cut = cut
pr.successValue = successValue
pr.verboseFailures = verboseFailures
pr.noDropBuffer = noDropBuffer
misc.foreach { case (k, v) => pr.misc.put(k, v) }
pr
}
object PRunData { // Copy all the mutable data from a parsing run into a PRunData value.
def ofParsingRun[T](pr: ParsingRun[T]): PRunData = PRunData(
pr.terminalMsgs,
pr.aggregateMsgs,
pr.shortMsg,
pr.lastFailureMsg,
pr.failureStack,
pr.isSuccess,
pr.logDepth,
pr.index,
pr.cut,
pr.successValue,
pr.verboseFailures,
pr.noDropBuffer,
mutable.Map.from(pr.misc),
)
}
def cacheGrammar[R](cache: mutable.Map[Int, PRunData], parser: => P[_])(implicit p: P[_]): P[R] = {
// The `parser` has not yet been run! And it is mutable. Do not run it twice!
val cachedData: PRunData = cache.getOrElseUpdate(p.index, PRunData.ofParsingRun(parser))
// After the `parser` has been run on `p`, the value of `p` changes and becomes equal to the result of running the parser.
// If the result was cached, we need to assign it to the current value of `p`. This will imitate the side effect of running the parser again.
cachedData.assignToParsingRun(p).asInstanceOf[P[R]]
}
// Need a separate cache for every memoized parser.
val cache_other = mutable.Map[Int, PRunData]()
// Need to do cache_other.clear() between different calls to parse an expression.
val n = 500
cache_other.clear()
assert(parse("(" * (n - 1) + "1" + ")" * (n - 1), program(_)).get.value == 1)
cache_other.clear()
assert(parse("1+" + "(1+" * (n - 1) + "1" + ")" * (n - 1), program(_)).get.value == n + 1) |
Beta Was this translation helpful? Give feedback.
-
I implemented The speedup is especially big after JVM warm-up. I have a performance test where I am parsing a realistic example file, - this was not a specially crafted test with hundreds of unneeded parentheses. Without memoizing, the parsing took 6 seconds even when repeated many times (so, JVM warmup does not improve performance). After memoizing several parsing rules, the parsing took 0.5 seconds, that is, I got a 12x speedup. Then I repeated this parse 2000 times, and the speedup grew to about 700x due to JVM warmup. (Best parsing time was 0.0075 seconds.) |
Beta Was this translation helpful? Give feedback.
-
I'm struggling to fix an exponential slowness in
fastparse 3.0.2
(Scala 2.13) used for an expression grammar. There is too much backtracking going on, and I have been adding cuts but the exponential slowness remains.I tried to add memoization to the parsers but this failed with strange error messages about unboxing
Unit
intoInteger
.A small working example of a grammar that shows exponential slowness:
The code works but parsing takes about 10 seconds. I found that parsing this expression with$2^{n-20}$ seconds. The reason for the exponential slowness is the backtracking. It tries to parse
n
parentheses takes aboutexpr
after(
and there are two possibilities (minus
andplus
). Each possibility will need to be fully explored before returning to parse the next(
. To explore means again to parseexpr
recursively. So, there is a 2x increase of parsing work for each parenthesis. To visualize the slowness, consider this function:It takes$2^n$ function calls to compute
fibonacci(n)
by this program.I tried adding cuts to the grammar at various places but the exponential slowness remains.
So, I have two questions:
Can we somehow mitigate this problem by adding cuts, or adding more grammar rules, or in some other way?
Memoization could help here: if a parser rule already failed at a certain location, it is not necessary to try again to parse with the same parser rule at the same location. If a parser rule succeeded at a certain location, and a parse is unique, it is not necessary to parse again at the same location. This may eliminate the exponential slowness if implemented correctly. Is there a way to implement that?
Beta Was this translation helpful? Give feedback.
All reactions