Request: checking within snake_case by default #2730

jamesbraza · 2023-02-07T06:39:10Z

bad_spellling = "bad"  # Not detected in codespell==2.2.2

Can codespell's default regex(s) support splitting along snake case's underscore and determining misspellings within particles?

This comment talks about using --regex to detect misspellings within snake case, and it lead to this draft PR.
Request for CamelCase support: CamelCase support? #196

The text was updated successfully, but these errors were encountered:

DimitriPapadopoulos · 2023-07-28T12:08:10Z

The underscore (_) is part of \w. From https://docs.python.org/3/library/re.html#regular-expression-syntax:

\w

For Unicode (str) patterns:
Matches Unicode word characters; this includes alphanumeric characters (as defined by str.isalnum()) as well as the underscore (_). If the ASCII flag is used, only [a-zA-Z0-9_] is matched.

For 8-bit (bytes) patterns:
Matches characters considered alphanumeric in the ASCII character set; this is equivalent to [a-zA-Z0-9_]. If the LOCALE flag is used, matches characters considered alphanumeric in the current locale and the underscore.

Is there an easy way to get \w except _ in the non-ASCII case? It would help checking snake_case.

codespell/codespell_lib/_codespell.py

Line 31 in ec0a5b9

word_regex_def = "[\\w\\-'’`]+"

Unicode regexes with set operations might help, but they are not available in Python yet. From https://docs.python.org/3/library/re.html#regular-expression-syntax:

Support of nested sets and set operations as in Unicode Technical Standard #18 might be added in the future. This would change the syntax, so to facilitate this change a FutureWarning will be raised in ambiguous cases for the time being. That includes sets starting with a literal '[' or containing literal character sequences '--', '&&', '~~', and '||'. To avoid a warning escape them with a backslash.

This what I have found so far, but I haven't been able to apply it to this use case yet:

DimitriPapadopoulos · 2023-07-28T13:59:34Z

A drawback of such a change is that we wouldn't be able to fix some (but not all) of the misspellings that contain an underscore, at least not by default:

clock_getttime->clock_gettime
phy_interace->phy_interface
unint8_t->uint8_t
__attribyte__->__attribute__
__cpluspus->__cplusplus
__cpusplus->__cplusplus

Unless of course, you add new misspellings such as cpluspus.

Gabrielcarvfer · 2023-07-30T03:27:53Z

I've been using the following for camel case, hyphen case and snake case.

(?<![a-z])[a-z'`]+|[A-Z][a-z'`]*|[a-z]+'[a-z]*|[a-z]+(?=[_-])|[a-z]+(?=[A-Z])|\d+

It indeed misses the cases where full words should be considered/checked, but sub-word typos seem to be the common case.
Adding a second pass to check just full words would be nice to check for type errors in documentation.

yarikoptic · 2024-07-26T17:16:51Z

FWIW, searched myself into this issue having seen typos finding typos in snake_case words in

add typos spell checker to pre-commit nebari-dev/nebari#2568

Disabled CameCased and ACRONYMs checks by default might also be wise but likely need to be configurable.

DimitriPapadopoulos added the enhancement label Feb 21, 2023

DimitriPapadopoulos mentioned this issue Jul 28, 2023

Default word regex and snake_case checking #2979

Closed

yarikoptic mentioned this issue Jul 26, 2024

add typos spell checker to pre-commit nebari-dev/nebari#2568

Merged

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request: checking within snake_case by default #2730

Request: checking within snake_case by default #2730

jamesbraza commented Feb 7, 2023

DimitriPapadopoulos commented Jul 28, 2023

DimitriPapadopoulos commented Jul 28, 2023 •

edited

Loading

Gabrielcarvfer commented Jul 30, 2023

yarikoptic commented Jul 26, 2024

Request: checking within snake_case by default #2730

Request: checking within snake_case by default #2730

Comments

jamesbraza commented Feb 7, 2023

DimitriPapadopoulos commented Jul 28, 2023

DimitriPapadopoulos commented Jul 28, 2023 • edited Loading

Gabrielcarvfer commented Jul 30, 2023

yarikoptic commented Jul 26, 2024

DimitriPapadopoulos commented Jul 28, 2023 •

edited

Loading