Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Properly escape inline link and image attributes. Fix #473. #474

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

pavelhoral
Copy link
Collaborator

@pavelhoral pavelhoral commented Jul 16, 2024

I have added proper escaping to images that is in line with the previous PR #460.

CommonMark specification defines rules for images +/- in the following way - "do the same as for links". So I have extracted the shared logic for escaping titles and links to dedicated functions.

Image alt attribute is used as link text. This text should have all markdown syntax escaped. For that purpose I have added new function escapeMarkdown, which is functionally the same as turndown.escape. This was necessary to prevent circular dependency and unnecessary refactoring.

A bit about new escapeMarkdown function

The new implementation is pretty different in a sense that it uses single regex with a mixture of non-capturing / capturing groups and a single look-ahead assertion. This means that some ancient runtime environment might have issues with it... Not sure if that is of any concertn, but at least compatibility tables on MDN and caniuse.com for the used features have the same versions as the RegExp itself:

We might want to create a single shared implementation of that function. But I am not sure how to approach this . The function is probably pretty good candidate for its own file (escape-markdown.js) or maybe it can be placed in utils.js.

Also we might want to do some performance testing to pick which function will perform better. The new regular expression is more complex, but the OG function executes 13 separate regular expressions on any given content. So my gut tells me that on any average sized content the new regex will perform better... would be nice to actually verify that.

@pavelhoral
Copy link
Collaborator Author

pavelhoral commented Jul 16, 2024

So my gut tells me that on any average sized content the new regex will perform better... would be nice to actually verify that.

My gut feeling was totally incorrect. Regex with replacement function is noticeably (~1.3x) slower... makes sense as the JS engine has to call JS to perform the actual replacement.

I guess I should refactor the PR to actually use the turndown function - i.e. move it to a place where it can be shared.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant