Significant parser slowdown for Latex with complex parameters #2445

fberlakovich · 2022-06-15T07:18:08Z

If I open the source of the tkz-tab package, the parser seems to get stuck (might be an infinite loop) on parsing some strict-keyval-pair. Perhaps you can have a look? You can find the file in your local latex distribution or at https://github.com/tkz-sty/tkz-tab/blob/master/latex/tkz-tab.sty

Originally posted by @PHPirates in #2388 (comment)

The text was updated successfully, but these errors were encountered:

PHPirates · 2022-06-15T07:35:11Z

I should note that the user story right now is that because we have sty files included in the LatexIndexableSetContributor, all these files will be parsed and hence if the parser gets stuck on one of them, the indexing gets stuck.

Parsing all these files was never the intention (though perhaps a nice test for the parser...) because it's very slow and unnecessary (since our indexers are regex-based) and clutters the stub indices (#2433), so I will try to fix that before the next release anyway.

PHPirates · 2022-10-02T10:14:39Z

Looks like the problem has come back in wargame/tikzlibrarywargame.natoapp6c.code (#2687, #2689)

Given #2684 as well, we should probably take another careful look at the keyval things in the parser to avoid more headaches in the future.
[Edit] See #2690

fberlakovich · 2022-10-02T10:49:39Z

Given #2684 as well, we should probably take another careful look at the keyval things in the parser to avoid more headaches in the future.
[Edit] See #2690

I agree that the current solution is less than ideal. I only took a quick glance at the changes you made, so I am not 100% sure yet whether my explanation is correct. I think that by reusing certain rules in the parser you gave the key/val rule a path to the parameter_text again which in turn leads to the issue I described here: #2447 (comment). The major problem is deciding in the parser whether parameters can/should be key/val pairs.
Currently, I see two possible ways forward:
a) keep the processing in the parser, close the path from key/val to parameter_text again and maybe devise a unit test that asserts it stays closed. It still has the downside that we have to artificially constrain the key/val rule to make sure it only parses things that might actually be key/val pairs and it still only a heuristic.
b) use a Grammarkit external expression (?). That is, try to parse key/val pairs in code which would give us the flexibility to decide whether something can or should be a key/val pair based on things not directly accessible to the parser (e.g. only do ke/val parsing for certain commands where we actually want the optional parameters to be represented as key/val pairs).
What do you think?

PHPirates · 2022-10-02T11:08:10Z

Well, I don't think I've worked with external rules yet so I don't know how convenient that is.
If it is any similar to a token remapper (LatexParserUtil) it might introduce more headaches than it solves, when I worked on that it was very confusing. But if you can figure out how it works, it might be a lot more maintainable :)

But wasn't the problem you described in #2447 (comment) caused by the keyval key being too restrictive, which is why we had | optional_param_content in the optional_param? If we didn't give the parser that second option for optional_param and everything must be a keyval_pair, wouldn't that solve the catastrophic backtracking because there are no other options for the parser to branch to in recursion?

In any case, I prefer whatever is not too slow and is safe (I mean, small probability of it breaking everything like now)

fberlakovich · 2022-10-02T15:55:59Z

But wasn't the problem you described in #2447 (comment) caused by the keyval key being too restrictive, which is why we had | optional_param_content in the optional_param? If we didn't give the parser that second option for optional_param and everything must be a keyval_pair, wouldn't that solve the catastrophic backtracking because there are no other options for the parser to branch to in recursion?

The problem I described was a consequence of having the choice in the parser which I considered a cleaner alternative to parsing everything - including things that clearly are no key/val pairs - as key/val pairs. Unfortunately, my fix in #2447 only restricted keys to avoid the recursion, but judging from the stack trace the issue reoccurred in values as well. Why your changes (unintentionally) fixed the issue is puzzling though.
I agree that parsing everything as key/val would probably solve the issue. But I think it would create an unintuitive API and would probably require a lot of changes to the existing clients.
I have no experience with Grammarkit's external rules and they could be a complete dead end. I would be happy to look into it though, considering I already suggested it in #2388. If, however, you prefer parsing everything as key/val pairs I am happy to implement that as well. Either way, it might take a while though (probably not before 20.10).

PHPirates · 2022-10-03T06:53:41Z

I don't know what would be best, but I would appreciate if you could have a look into external rules.
Yeah no hurries, I have to do a quick fix for this particular issue anyway, it's just a matter of trying to prevent a stuck parser in the future which can be triggered by any update of any texlive package.

I think the unintuitive API could be alleviated somewhat by then parsing the key-value pairs at application level manually and then providing all the extension functions to replace the parser information - I think this would be the fallback solution if external rules don't work out, as it's probably the same code but then less nicely integrated with the parser.

fberlakovich · 2022-10-03T07:09:35Z

I will keep you updated about any progress.

PHPirates · 2022-10-04T18:51:51Z

As it turned out, the performance issue was something completely unrelated so that's good. And all of the discussion still stands, because while researching that I found some more characters that are not accepted by the parser where they should be. One example is #2692 but there are more like that.
So this is something that I tried to fix for one particular case with #2690 which maybe won't work (?) and for which external parser rules could be a solution, as we then are sure we're not missing anything in the parser (as it will be 'parsed' at application level).
I probably should go over the parser to try to figure out if there are more missing characters.

fberlakovich · 2022-10-05T16:42:49Z

As it turned out, the performance issue was something completely unrelated so that's good. And all of the discussion still stands, because while researching that I found some more characters that are not accepted by the parser where they should be. One example is #2692 but there are more like that.

Interesting, it looked very much like the original parser issue. I hope for the external rules not just to be more robust, but also for the possibility to "opt-in" to key/val parsing instead of a catch-all approach because the list of commands where we actually want key/vals isn't that long at the moment.

fberlakovich · 2023-11-18T10:19:31Z

@PHPirates I just realized I never wrote back on this discussion, so here goes my answer, only one year late. 🤦 I did actually look into external rules, but he problem was in the hand written parsing code I couldn't find a way to get the necessary context information. So for example, I was trying to "look back" and figure out the command that was parsed previously, so I could decide whether I want to parse the parameters as key/vals or just regular parameters. I will try to ressurect my notes and post a question in the intellij support/grammarkit repo, but I have little hope that external rules are the solution to the problem

fberlakovich · 2023-11-19T15:20:01Z

I found a support question that looks similar to what I suggested: https://intellij-support.jetbrains.com/hc/en-us/community/posts/206123179-How-to-produce-a-slightly-different-psiTree-from-astTree-. Instead of remembering the ID though, we would have to remember the command name and then predicate the parameter parsing on that name. If it is a name known to have keyval parameters, we would parse keyval params, otherwise the regular parameters. The question is quite old though, so I will have to test whether it actually works.

The topic of dynamic parsing pops up once in a while and there seem to be ways to do it, but the GrammarKit documentation leaves a lot to be desired.

fberlakovich · 2023-11-19T16:47:45Z

@PHPirates I saw a bit too late that the optional param key/val parsing now just parses everything as key/value pairs. So I guess there is no immediate need for the predicated parsing. I have an experiment working nonetheless that parses the optional parameters either as key/value pair or directly as optional_param_content, depending on the previously parsed command. Maybe such an infrastructure comes in useful at some point in the future.

PHPirates · 2023-11-21T08:10:14Z

Very interesting, thanks for the link.
See #3112 for why the change was made, I forgot the details but I remember at the time it made sense to me. I didn't see any need for the distinction made, I think, since it's just semantics which complicates the parsing. Not sure if the same holds for requirement parameters though.

That does sound very interesting, though not necessarily for key-value pairs parsing. I can't think of a direct application but I have the feeling I was looking for something like that in the past. Maybe you want to share your experiment?

fberlakovich · 2023-11-21T09:16:43Z

Very interesting, thanks for the link.
See #3112 for why the change was made, I forgot the details but I remember at the time it made sense to me. I didn't see any need for the distinction made, I think, since it's just semantics which complicates the parsing. Not sure if the same holds for requirement parameters though.

Thanks, I found the PR after writing the experiment. I think the small amount of changes neccessary in @slideclimb's PR justifies the approach, especially when it works well.

That does sound very interesting, though not necessarily for key-value pairs parsing. I can't think of a direct application but I have the feeling I was looking for something like that in the past. Maybe you want to share your experiment?

Sure, I pushed the experiment to a branch in my fork: master...fberlakovich:TeXiFy-IDEA:parser-experiment. Obviously the experiment is ugly and ignores several edge cases, but you get the idea. I also think it could be useful in the future, given the high amount of dynamism in Latex.

PHPirates · 2023-11-22T07:14:30Z

Thanks! I made note of it so I can find it back when needed.

fberlakovich mentioned this issue Jun 15, 2022

Glossaries support #2388

Merged

1 task

fberlakovich mentioned this issue Jun 15, 2022

Improve parser performance #2447

Merged

PHPirates added this to the b0.7.19 milestone Jun 15, 2022

PHPirates assigned fberlakovich Jun 15, 2022

PHPirates closed this as completed in #2447 Jun 15, 2022

PHPirates mentioned this issue Jun 18, 2022

Indexing not finishing #2460

Closed

PHPirates added the bug Deficiencies in TeXiFy behaviour. label Jun 18, 2022

PHPirates mentioned this issue Oct 2, 2022

About key-value pairs in optional parameters #2690

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Significant parser slowdown for Latex with complex parameters #2445

Significant parser slowdown for Latex with complex parameters #2445

fberlakovich commented Jun 15, 2022

PHPirates commented Jun 15, 2022

PHPirates commented Oct 2, 2022 •

edited

Loading

fberlakovich commented Oct 2, 2022

PHPirates commented Oct 2, 2022

fberlakovich commented Oct 2, 2022

PHPirates commented Oct 3, 2022

fberlakovich commented Oct 3, 2022

PHPirates commented Oct 4, 2022

fberlakovich commented Oct 5, 2022

fberlakovich commented Nov 18, 2023

fberlakovich commented Nov 19, 2023 •

edited

Loading

fberlakovich commented Nov 19, 2023

PHPirates commented Nov 21, 2023

fberlakovich commented Nov 21, 2023

PHPirates commented Nov 22, 2023

Significant parser slowdown for Latex with complex parameters #2445

Significant parser slowdown for Latex with complex parameters #2445

Comments

fberlakovich commented Jun 15, 2022

PHPirates commented Jun 15, 2022

PHPirates commented Oct 2, 2022 • edited Loading

fberlakovich commented Oct 2, 2022

PHPirates commented Oct 2, 2022

fberlakovich commented Oct 2, 2022

PHPirates commented Oct 3, 2022

fberlakovich commented Oct 3, 2022

PHPirates commented Oct 4, 2022

fberlakovich commented Oct 5, 2022

fberlakovich commented Nov 18, 2023

fberlakovich commented Nov 19, 2023 • edited Loading

fberlakovich commented Nov 19, 2023

PHPirates commented Nov 21, 2023

fberlakovich commented Nov 21, 2023

PHPirates commented Nov 22, 2023

PHPirates commented Oct 2, 2022 •

edited

Loading

fberlakovich commented Nov 19, 2023 •

edited

Loading