Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significant parser slowdown for Latex with complex parameters #2445

Closed
fberlakovich opened this issue Jun 15, 2022 · 15 comments · Fixed by #2447
Closed

Significant parser slowdown for Latex with complex parameters #2445

fberlakovich opened this issue Jun 15, 2022 · 15 comments · Fixed by #2447
Assignees
Labels
bug Deficiencies in TeXiFy behaviour.
Milestone

Comments

@fberlakovich
Copy link
Contributor

If I open the source of the tkz-tab package, the parser seems to get stuck (might be an infinite loop) on parsing some strict-keyval-pair. Perhaps you can have a look? You can find the file in your local latex distribution or at https://github.com/tkz-sty/tkz-tab/blob/master/latex/tkz-tab.sty

Screenshot_2022-06-14_17-00-14

Originally posted by @PHPirates in #2388 (comment)

@fberlakovich fberlakovich mentioned this issue Jun 15, 2022
1 task
@PHPirates
Copy link
Collaborator

I should note that the user story right now is that because we have sty files included in the LatexIndexableSetContributor, all these files will be parsed and hence if the parser gets stuck on one of them, the indexing gets stuck.

Parsing all these files was never the intention (though perhaps a nice test for the parser...) because it's very slow and unnecessary (since our indexers are regex-based) and clutters the stub indices (#2433), so I will try to fix that before the next release anyway.

@PHPirates PHPirates added this to the b0.7.19 milestone Jun 15, 2022
@PHPirates PHPirates added the bug Deficiencies in TeXiFy behaviour. label Jun 18, 2022
@PHPirates
Copy link
Collaborator

PHPirates commented Oct 2, 2022

Looks like the problem has come back in wargame/tikzlibrarywargame.natoapp6c.code (#2687, #2689)

image

Given #2684 as well, we should probably take another careful look at the keyval things in the parser to avoid more headaches in the future.
[Edit] See #2690

@fberlakovich
Copy link
Contributor Author

Given #2684 as well, we should probably take another careful look at the keyval things in the parser to avoid more headaches in the future.
[Edit] See #2690

I agree that the current solution is less than ideal. I only took a quick glance at the changes you made, so I am not 100% sure yet whether my explanation is correct. I think that by reusing certain rules in the parser you gave the key/val rule a path to the parameter_text again which in turn leads to the issue I described here: #2447 (comment). The major problem is deciding in the parser whether parameters can/should be key/val pairs.
Currently, I see two possible ways forward:
a) keep the processing in the parser, close the path from key/val to parameter_text again and maybe devise a unit test that asserts it stays closed. It still has the downside that we have to artificially constrain the key/val rule to make sure it only parses things that might actually be key/val pairs and it still only a heuristic.
b) use a Grammarkit external expression (?). That is, try to parse key/val pairs in code which would give us the flexibility to decide whether something can or should be a key/val pair based on things not directly accessible to the parser (e.g. only do ke/val parsing for certain commands where we actually want the optional parameters to be represented as key/val pairs).
What do you think?

@PHPirates
Copy link
Collaborator

Well, I don't think I've worked with external rules yet so I don't know how convenient that is.
If it is any similar to a token remapper (LatexParserUtil) it might introduce more headaches than it solves, when I worked on that it was very confusing. But if you can figure out how it works, it might be a lot more maintainable :)

But wasn't the problem you described in #2447 (comment) caused by the keyval key being too restrictive, which is why we had | optional_param_content in the optional_param? If we didn't give the parser that second option for optional_param and everything must be a keyval_pair, wouldn't that solve the catastrophic backtracking because there are no other options for the parser to branch to in recursion?

In any case, I prefer whatever is not too slow and is safe (I mean, small probability of it breaking everything like now)

@fberlakovich
Copy link
Contributor Author

But wasn't the problem you described in #2447 (comment) caused by the keyval key being too restrictive, which is why we had | optional_param_content in the optional_param? If we didn't give the parser that second option for optional_param and everything must be a keyval_pair, wouldn't that solve the catastrophic backtracking because there are no other options for the parser to branch to in recursion?

The problem I described was a consequence of having the choice in the parser which I considered a cleaner alternative to parsing everything - including things that clearly are no key/val pairs - as key/val pairs. Unfortunately, my fix in #2447 only restricted keys to avoid the recursion, but judging from the stack trace the issue reoccurred in values as well. Why your changes (unintentionally) fixed the issue is puzzling though.
I agree that parsing everything as key/val would probably solve the issue. But I think it would create an unintuitive API and would probably require a lot of changes to the existing clients.
I have no experience with Grammarkit's external rules and they could be a complete dead end. I would be happy to look into it though, considering I already suggested it in #2388. If, however, you prefer parsing everything as key/val pairs I am happy to implement that as well. Either way, it might take a while though (probably not before 20.10).

@PHPirates
Copy link
Collaborator

I don't know what would be best, but I would appreciate if you could have a look into external rules.
Yeah no hurries, I have to do a quick fix for this particular issue anyway, it's just a matter of trying to prevent a stuck parser in the future which can be triggered by any update of any texlive package.

I think the unintuitive API could be alleviated somewhat by then parsing the key-value pairs at application level manually and then providing all the extension functions to replace the parser information - I think this would be the fallback solution if external rules don't work out, as it's probably the same code but then less nicely integrated with the parser.

@fberlakovich
Copy link
Contributor Author

I will keep you updated about any progress.

@PHPirates
Copy link
Collaborator

As it turned out, the performance issue was something completely unrelated so that's good. And all of the discussion still stands, because while researching that I found some more characters that are not accepted by the parser where they should be. One example is #2692 but there are more like that.
So this is something that I tried to fix for one particular case with #2690 which maybe won't work (?) and for which external parser rules could be a solution, as we then are sure we're not missing anything in the parser (as it will be 'parsed' at application level).
I probably should go over the parser to try to figure out if there are more missing characters.

@fberlakovich
Copy link
Contributor Author

As it turned out, the performance issue was something completely unrelated so that's good. And all of the discussion still stands, because while researching that I found some more characters that are not accepted by the parser where they should be. One example is #2692 but there are more like that.

Interesting, it looked very much like the original parser issue. I hope for the external rules not just to be more robust, but also for the possibility to "opt-in" to key/val parsing instead of a catch-all approach because the list of commands where we actually want key/vals isn't that long at the moment.

@fberlakovich
Copy link
Contributor Author

@PHPirates I just realized I never wrote back on this discussion, so here goes my answer, only one year late. 🤦 I did actually look into external rules, but he problem was in the hand written parsing code I couldn't find a way to get the necessary context information. So for example, I was trying to "look back" and figure out the command that was parsed previously, so I could decide whether I want to parse the parameters as key/vals or just regular parameters. I will try to ressurect my notes and post a question in the intellij support/grammarkit repo, but I have little hope that external rules are the solution to the problem

@fberlakovich
Copy link
Contributor Author

fberlakovich commented Nov 19, 2023

I found a support question that looks similar to what I suggested: https://intellij-support.jetbrains.com/hc/en-us/community/posts/206123179-How-to-produce-a-slightly-different-psiTree-from-astTree-. Instead of remembering the ID though, we would have to remember the command name and then predicate the parameter parsing on that name. If it is a name known to have keyval parameters, we would parse keyval params, otherwise the regular parameters. The question is quite old though, so I will have to test whether it actually works.

The topic of dynamic parsing pops up once in a while and there seem to be ways to do it, but the GrammarKit documentation leaves a lot to be desired.

@fberlakovich
Copy link
Contributor Author

@PHPirates I saw a bit too late that the optional param key/val parsing now just parses everything as key/value pairs. So I guess there is no immediate need for the predicated parsing. I have an experiment working nonetheless that parses the optional parameters either as key/value pair or directly as optional_param_content, depending on the previously parsed command. Maybe such an infrastructure comes in useful at some point in the future.

@PHPirates
Copy link
Collaborator

Very interesting, thanks for the link.
See #3112 for why the change was made, I forgot the details but I remember at the time it made sense to me. I didn't see any need for the distinction made, I think, since it's just semantics which complicates the parsing. Not sure if the same holds for requirement parameters though.

That does sound very interesting, though not necessarily for key-value pairs parsing. I can't think of a direct application but I have the feeling I was looking for something like that in the past. Maybe you want to share your experiment?

@fberlakovich
Copy link
Contributor Author

Very interesting, thanks for the link.
See #3112 for why the change was made, I forgot the details but I remember at the time it made sense to me. I didn't see any need for the distinction made, I think, since it's just semantics which complicates the parsing. Not sure if the same holds for requirement parameters though.

Thanks, I found the PR after writing the experiment. I think the small amount of changes neccessary in @slideclimb's PR justifies the approach, especially when it works well.

That does sound very interesting, though not necessarily for key-value pairs parsing. I can't think of a direct application but I have the feeling I was looking for something like that in the past. Maybe you want to share your experiment?

Sure, I pushed the experiment to a branch in my fork: master...fberlakovich:TeXiFy-IDEA:parser-experiment. Obviously the experiment is ugly and ignores several edge cases, but you get the idea. I also think it could be useful in the future, given the high amount of dynamism in Latex.

@PHPirates
Copy link
Collaborator

Thanks! I made note of it so I can find it back when needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Deficiencies in TeXiFy behaviour.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants