Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[question] Plans to resync the lexers with Scintillua? #908

Closed
moesasji opened this issue Dec 10, 2020 · 13 comments
Closed

[question] Plans to resync the lexers with Scintillua? #908

moesasji opened this issue Dec 10, 2020 · 13 comments

Comments

@moesasji
Copy link
Contributor

Looking through the outstanding issues a lot appear related to outdated lexers and the most logical approach seems to be to resync the lexers with upstream if at all possible.

However I see that the format used by vis is now the legacy format for Scintillua with a migration guide here: https://orbitalquark.github.io/scintillua/api.html#migrating-legacy-lexers.

Is it realistic to move to the new format of the lexers as used by Scintillua seeing that those have seen considerable updates ( https://github.com/orbitalquark/scintillua/tree/default/lexers ) or is there a different plan in place?

@martanne
Copy link
Owner

Thanks for bringing this up.

I initially chose Scintillua because it is flexible, relatively lightweight, easy to integrate and had a lot of existing syntax definitions. Basically I was looking for a simple highlighting scheme where I can out source the actual maintenance work. Towards that goal I also submitted changes from our community back upstream (see e.g. here, here and here).

Then upstream decided to embrace an object-oriented API. I am not sure whether I like the new approach. At the time I remarked that the existing _rules table emphasizes the importance of rule ordering for PEGs by grouping them into a single table. Ever since the two implementations diverged.

Mostly because I didn't spend time to properly look at the upstream changes and the existing code worked "good enough". As an aside, for me personally syntax highlighting isn't of upmost importance, it is useful to indicate runaway strings etc. but I don't need much more than that. However, as indicated by the number of filled issues, the user base probably thinks differently. Also a lot (the majority?) of external contributions are related to it.

The API changes should mostly be mechanical. However, conceptually our integration is probably a bit more performance sensitive than in the upstream case. We do currently not maintain a token cache, meaning we (re)start lexing from scratch after every redraw. As a result we added an upper bound on lexing time (see 15d213e) and removed some especially slow rules (e.g. #797, #726).

To get a scope for the work, one would have to go through both repositories compare them and list

  1. lexers only present in our tree
  2. lexers only present upstream
  3. lexers with diverging rule changes

Points 1) and 2) should be fairly simple based on the source file names and file type association. 3) is more complex, the rather mechanic API changes preserving the existing rules aren't particularly important. Of interest are the logical modifications, i.e. everything which would need be reapplied on top of upstream. Some issues might already be fixed upstream, others should be fixed differently etc. This would need some coordination. One approach is to go through our commit history up to the last synchronization point:

git log --stat --oneline dc5f5a45a2315011ebeeb0a56a7434ead292dc96...HEAD lua/lexers/

To summarize, the goal is still the same: have a simple, flexible highlighting scheme with the least amount of (long term) work for myself. If this can be accomplished by sharing resources with upstream even better.

@moesasji
Copy link
Contributor Author

moesasji commented Dec 10, 2020

To summarize, the goal is still the same: have a simple, flexible highlighting scheme with the least amount of (long term) work for myself. If this can be accomplished by sharing resources with upstream even better.

Thanks for the extensive response and thoughts on this. If I understand it correctly tracking the new upstream format used by Scintillua isn't an option for performance reasons, which unfortunately automatically implies additional maintenance effort compared to just tracking upstream code. Or am I misunderstanding your response?

btw) In particular the third point sounds very painful if it isn't a priority for your own usage as that requires some insight in the reason/motivation behind changes being made in either code-base seeing that the lexers already look substantially different even if the changes made might be only mechanical in nature. (I had a quick look at the ansi_c one to get an impression of the differences)

answer to point 1 - lexers only present in vis tree
< clojure.lua
< dsv.lua
< elm.lua
< fantom.lua
< fstab.lua
< gemini.lua
< git-rebase.lua
< julia.lua
< meson.lua
< networkd.lua
< pony.lua
< reason.lua
< routeros.lua
< spin.lua
< strace.lua
< systemd.lua
< xs.lua
< zig.lua

answer to point 2 - lexers only present in upstream tree

jq.lua
mediawiki.lua
txt2tags.lua

@martanne
Copy link
Owner

If I understand it correctly tracking the new upstream format used by Scintillua isn't an option for performance reasons

No, that is not what I intended to express. I meant we should rebase our (performance sensitive) changes on top of upstream.

In particular the third point sounds very painful

Yes it needs some effort and more importantly discussion with upstream. For each lexer one would have to: import the current Scintillua version, go through our git history for the file in question and apply the same changes, create a patch/pull request for upstream.

I am not sure how much the actual lexer rules have diverged. ansi_c is probably a bad example in this regard, because it is one of the most used and changed ones.

@moesasji
Copy link
Contributor Author

Thanks for the clarification as this now makes sense to me; first step in this appears to be to make vis understand the new lexer format as just swapping out a lexer looses the highlighting.

@moesasji
Copy link
Contributor Author

First step in this appears to be to make vis understand the new lexer format as just swapping out a lexer looses the highlighting.

Looking through the code I suspect the following calls to lexer._TOKENSTYLE require adapting to be able to do the boring work to gradually switch away from the legacy lexers:

https://github.com/martanne/vis/blob/master/lua/vis.lua#L266
https://github.com/martanne/vis/blob/master/lua/vis-std.lua#L66

Unfortunately Lua is new to me....

@martanne
Copy link
Owner

I rebased our changes on top of the most recent upstream lexer.lua and pushed the result to the scintillua branch. It isn't really tested, but at least in theory should understand both lexer formats.

Based on your list above, it seems like our community is at least as active as upstream? Albeit our lexers might be of more dubious quality. I tend to generously merge changes in this area because they are self-contained and somewhat hard to test without example files and familiarity of the format. Also we have some fairly obscure stuff which might not be of general interest. I still haven't looked at the modifications of individual lexers and what kind of improvements upstream developed. But I guess if we wanted, we could also do it the other way around and backport those ...

Maybe some past contributors would like to comment on their preferred format? We also have a mailing list for those of you who prefer that.

@moesasji
Copy link
Contributor Author

I rebased our changes on top of the most recent upstream lexer.lua and pushed the result to the scintillua branch. It isn't really tested, but at least in theory should understand both lexer formats.

A quick try in swapping out a legacy lexer for an upstream one seems to work as expected, so for me your change does the job.

Based on your list above, it seems like our community is at least as active as upstream?

Looks like it, but that might just be a reflection of the type of user that is attracted by vis, i.e. users more willing or capable to commit a lexer they need. It could very well be that upstream has more people actually using the lexers as Scite used to be pretty popular for Windows users and their mailing list appears pretty active (far more than the vis one). Also on that mailing list there a lot of questions and activity around lexers. Note that both Geany and Anjuta actually use Scintilla as well (https://texteditors.org/cgi-bin/wiki.pl?ScintillaEditorFamily)

But I guess if we wanted, we could also do it the other way around and backport those ...

Maybe some past contributors would like to comment on their preferred format? We also have a mailing list for those of you who prefer that.

I think this is really your call, but goes a bit against the idea of trying to minimize work for you.

Whatever way you go: to assist with some of the issues with quality of the lexer it might be worth starting to use the tests that are part of the upstream tests of lexers, see: https://github.com/orbitalquark/scintillua/blob/default/tests.lua

btw) a quick look at the book on Lua did manage to put me off. It really isn't a language I want to touch in my spare time. Sorry!

@martanne
Copy link
Owner

Note that both Geany and Anjuta actually use Scintilla as well

I am not really familiar with either environment, but don't they typically use the C++ lexers?

goes a bit against the idea of trying to minimize work for you.

That would indeed be undesirable. Generally fragmentation of already small communities should be avoided. Anyway, somebody would have to go through the respective changes and merge one into the other. I am still hoping somebody else will step up and do the actual work.

btw) a quick look at the book on Lua did manage to put me off. It really isn't a language I want to touch in my spare time. Sorry!

That is a pity. I think it is well suited for what we are using it for and an easy way to contribute something.

@moesasji
Copy link
Contributor Author

I am not really familiar with either environment, but don't they typically use the C++ lexers?

Both would be editors that are popular in the gnome/xfce community, but a quick check shows that both indeed use the C++ lexers.

That would indeed be undesirable. Generally fragmentation of already small communities should be avoided. Anyway, somebody would have to go through the respective changes and merge one into the other. I am still hoping somebody else will step up and do the actual work.

With the changes you've made I'll at least try to make a start as keeping things more in sync would make sense to not fragment more than needed.

btw) a quick look at the book on Lua did manage to put me off. It really isn't a language I want to touch in my spare time. Sorry!

That is a pity. I think it is well suited for what we are using it for and an easy way to contribute something.

It is indeed well suited for what you need; it is just that some of the choices they've made in the grammar and use of symbols would drive me nuts.

@martanne
Copy link
Owner

The current effort can be tracked in the scintillua branch. The remaining TODO items are:

  • theme review: some lexers use pre-defined styles which our (default) themes should provide, zenburn in particular is missing some of these. That is for example the reason why some markdown elements are not properly highlighted, even though the lexer matches them.
  • performance patches in html, xml and wsf lexers: those have been rebased, but are not entirely correct. They are also not all identical, the original patch 7e9e0a2 only changed the in_tag definition while the subsequent baa51e9 also uncommented its use, but only in the html lexer.
  • bash here document patches, those are worked on upstream.
  • dsv, in legacy format, not picked up by upstream, is of limited use, can probably be removed? @eworm-de you contributed that initially, do you use it?
  • gemini, in legacy format, not picked up by upstream, seems mostly copied from markdown lexer, could probably be improved, the style section is useless for vis. I would suggest to remove it for now, @lanodan can contribute it upstream and it will eventually trickle down to us.
  • strace and git-rebase, in new format, not picked up by upstream, they are a bit special in that they are not typical file formats, but program output. Check whether upstream is interested in them, otherwise maintain them ourselves.

@moesasji thanks again for the initial work, maybe you could check whether I missed anything?

@eworm-de
Copy link
Collaborator

* [ ]  dsv, in legacy format, not picked up by upstream, is of limited use, can probably be removed? @eworm-de you contributed that initially, do you use it?

We use this for user and group files (passwd, shadow, group and gshadow). Wondering if adding the file extension .csv makes sense.

So I use it whenever editing one of the above files.
Guess I could live without...

@lanodan
Copy link
Contributor

lanodan commented Aug 25, 2022 via email

@mcepl
Copy link
Contributor

mcepl commented Nov 29, 2022

@ninewise, #1018 has been merged, so this is probably obsolete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants