-
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Obsolete attributes regex doesn't account for valueless attribute use #3
Comments
Yes! This must be caught and dealt with appropriately. Will work on an update, thanks! |
const attributeRegex = new RegExp(`<[^>]*\\b${attribute}\\b(?=\\s*(=|\\s*[/]*>))`, 'i'); maybe? |
Thanks @FabianBeiner for the proactive feedback! Was just about to jump at this after preparing a (basic) test, but that regex seems to work! Prepared PR #4 for this. Feel free to review and leave feedback—as the whole project is an AI test case, I’m working with and testing a number of tools to support, too. |
Wanted to make more time available for review, but accidentally merged the PR—still open to feedback if you have more thoughts! Thanks for the report, @mattbrundage, and the update, @FabianBeiner! |
@FabianBeiner your solution is an incremental improvement, but still matches sequences such as |
Oh, the infamous unquoted attribute value syntax, of course I forgot about that. 🙈 Which brings us back to your original words:
However, here is an update: const attributeRegex = new RegExp(`<[^>]*\\s${attribute}\\b(\\s*=\\s*(?:"[^"]*"|'[^']*'|[^"'\\s>]+))?\\s*(?=/?>)`, 'i'); With attributes, we most likely will see a space before them, and I tried to consider that people might also use ' instead of " or nothing at all. This should not work on <th nowrap>
<th nowrap=nowrap>
<th nowrap="nowrap">
<th nowrap='nowrap'>
<th class="nowrap" nowrap>
<th class="nowrap" nowrap=nowrap>
<th class="nowrap" nowrap="nowrap">
<th class="nowrap" nowrap='nowrap'> but not on <th class="something nowrap">
<th class=nowrap>
<th title=nowrap>
<th title="The nowrap attribute is obsolete"> 🤞🏻 |
@FabianBeiner LGTM |
(Reopening, will review! Thanks for the updates…!) |
Prepared #9 for this, including a better test case (these may still be poor, but this one should finally catch what you described, @mattbrundage). The new regex seems to work well here, @FabianBeiner! (Will let this sit for a moment and not merge right away.) |
Just pushed 1.6.2, which contains improvements. Can imagine this needs more refining, but rely on issue reports right now to take care of it. Plan to review and extend the tests to cover more cases there. |
Valueless attribute use is common with boolean attributes. Among your list of obsolete attributes, "noshade" and "nowrap" need special handling to account for scenarios such as
<th nowrap>
, but while also avoiding false positives such as<th title="The nowrap attribute is obsolete">
.Using regex on HTML is a minefield.
The text was updated successfully, but these errors were encountered: