-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve Spelling rule by using word boundaries #900
base: main
Are you sure you want to change the base?
Conversation
Deploy PR Preview failed. |
This is a "heavy-handed" approach where word boundary delimiters are used for every filter regular expression, for consistency, even those filters that might be unambiguous and unlikely to be contained within other words. |
I would like to drop my thought on this Some rules comes with https://vale.sh/docs/topics/styles/ I don't remember whether
So if you use It is also useful when you want to catch multiple words So adding \b somewhere can be useless, if you are using nonword:true AND there are some rules where you don't use \b on both side. Otherwise, you can use the default nonword:false and avoid repeating the \b everywhere Let's go back to this PR, I don't see a need to add \b on both side. Also, I don't think it works as expected, especially because I don't see a Finally, the So I don't see what you are trying to fix, and things are getting very complicated... |
I did not know about the Try running the following with
With both
Neither adding If I change the
as would be expected. Granted, "mistache" is an unlikely typo but I originally stumbled upon this issue when reviewing some writing where someone had a simple transposition mistake, "subcommnad", and because of the RH spelling style rule filter ( As I said in a comment above, I took a heavy-handed approach in this PR by delimiting all the filters with word boundaries. I am open to a more selective approach and that might be something the RH team considers. This was just the easiest approach to be certain not to miss any potential word partials. |
thanks for you reply and tests. Let me ask for help. @jdkato what would you recommend here? What do you think about what I wrote in my previous post? |
I think this PR solves the problem outlined in #894 sufficiently. However, this isn't how I personally would go about handling spelling exceptions. My general practices are:
|
Thank you @jdkato for the thoughts about this PR. @ccoVeille any opinions or directions from the Red Hat team about which direction to take this PR? Leave as is or go down the mentioned general practices road, which would require (I think) doing things differently in other places (dictionary, vocabulary)? |
Hi @emteelb If you asked me the same question in a very short delay after @jdkato replied, I would have say it depends on your expectations in term of delay. And the long term direction should be the one @jdkato mentioned. But, the PR is opened for 3 months, so I guess the issue you are trying to fix is not urgent. So, it means you have time to fix it the right way. The fact you are on a large codebase with many contributors could be an issue. So it's a matter of pro and cons you will have to figure with this project maintainers |
Regarding the problem at hand, I would be inclined to fix the rule for just those rules terms that are known or likely to give false positives, for example "\b[cC]he\b", or other small words that might be found in larger terms.
|
By using regex word boundary (\b) delimiters, the spelling rule applies to individual words rather than a word that might contain the regex filter. For example, `\b[cC]he\b` will match only "che" and "Che" rather than a regex filter without word boundary delimiters, for example, `[cC]he` that would match misspelled words that contain the regex, such as "aache" or "chemitsry". This commit also combines multiple related filters that share a common word base, for example, a single filter `"[bB]reakpoint(s)?` rather than `[bB]reakpoint` and `[bB]reakpoints`.
@aireilly Had another go at this, trying to keep in mind your guidance. I also tried to economize and combined some of the filters, for example:
rather than
In the existing RH Spelling style rule, Also, just something to be aware of and perhaps pointing to @jdkato 's advice for a different long-term solution: Without word boundary delimiters around filters, the spelling rule will not catch slippery finger mistakes such as "defragmentationn". |
Great work @emteelb ❇️ Can you PTAL at the test fixtures and update accordingly:
|
@aireilly Thanks for the feedback. Took a look at the shell script that runs to validate RH style rules against corresponding valid/invalid files w/i the For example:
in the pull request would become:
An alternative approach would be to create a separate pull request that would change the shell script to use the regex found in the filters or tokens in a given style rule file and keep a running counter against matches in corresponding fixture files for the style rule. Then compare the running counter number against the That's a vague idea. Hopefully the meaning is clear. Do you prefer one or the other approach? Something different? |
@emteelb if you're happy to review/modify the shell script please go ahead in this PR :) Do whatever you think is best, happy to review. Go for the path of least resistance :) |
By using regex word boundary (\b) delimiters, the spelling rule applies to individual words rather than a word that might contain the regex filter. For example, "\b[cC]he\b" will match only "che" and "Che" rather than a regex filter without word boundary delimiters, for example, "[cC]he" that would match misspelled words that contain the regex, such as "aache" or "chemitsry".
Closes #894