-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
URL regex is not considering ponctuaction #76
Comments
It turned out that the problem might not come from the regex but from the fact that the regex is applied on the non-encoded URL. This is correctly parsed by Galène : https://example.com/Lettre%C3%80%C3%89lise |
There's the coding issue, which is due to the fact that I don't know how to do Unicode regexps in Javascript. There's also the issue of punctuation, but this one needs to preserve punctuation at the end of URLs:
But
I need help with this. |
Found this StackOverflow post with some link to interesting libraries: https://stackoverflow.com/questions/37684/how-to-replace-plain-urls-with-links/21925491#21925491 We could use a library such as anchorme.js which seems to be rather accurate but it adds a lot of code. Maybe we would rather want something smaller but with lower accuracy? For example, do we need to check URL against IANA list? Do we need to have the list of all existing TLDs (https://github.com/alexcorvi/anchorme.js/blob/gh-pages/src/tlds.ts)? For Unicode support, this lib seems to do this: https://github.com/alexcorvi/anchorme.js/blob/gh-pages/src/dictionary.ts#L29 If we don't need all this extra verification, I might try to do a striped down/simpler fork of anchorme.js for Galène as the code seems rather clean. |
I just noticed that my terminal emulator (Alacritty) is matching URL quite well. Looking at the code, it's using https://github.com/chrisduerr/rfind_url/ which consist of one Rust file to match URLs. It does not look that complex, but it's definitely more than just a simple regex. |
This regex does not seem to always work. For example, this link is correctly considered by Github Markdown parser, but not by Galène:
We need to have a quite complex regex as we don't want to consider trailing dots,
<>
characters...If I find a better URL regex, I will post it here.
The text was updated successfully, but these errors were encountered: