Skip to content

Commit ce280c9

Browse files
committed
Speed up codespell:ignore check by skipping the regex in most cases
The changes to provide a public API had some performance related costs of about 1% runtime. There is no trivial way to offset this any further without undermining the API we are building. However, we can pull performance-related shenanigans to compenstate for the cost introduced. The codespell codebase unsurprisingly spends a vast majority of its runtime in various regex related code such as `search` and `finditer`. The best way to optimize runtime spend in regexes is to not do a regex in the first place, since the regex engine has a rather steep overhead over regular string primitives (that is the cost of flexibility). If the regex rarely matches and there is a very easy static substring that can be used to rule out the match, then you can speed up the code by using `substring in string` as a conditional to skip the regex. This is assuming the regex is used enough for the performance to matter. An obvious choice here falls on the `codespell:ignore` regex, because it has a very distinctive substring in the form of `codespell:ignore`, which will rule out almost all lines that will not match. With this little trick, runtime goes from ~5.6s to ~4.9s on the corpus mentioned in #3419.
1 parent 3c08c9b commit ce280c9

File tree

1 file changed

+6
-1
lines changed

1 file changed

+6
-1
lines changed

codespell_lib/spellchecker.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,10 @@
109109

110110
_builtin_default_as_tuple = tuple(_builtin_default.split(","))
111111

112-
_inline_ignore_regex = re.compile(r"[^\w\s]\s?codespell:ignore\b(\s+(?P<words>[\w,]*))?")
112+
_codespell_ignore_tag = "codespell:ignore"
113+
_inline_ignore_regex = re.compile(
114+
rf"[^\w\s]\s?{_codespell_ignore_tag}\b(\s+(?P<words>[\w,]*))?"
115+
)
113116

114117

115118
class UnknownBuiltinDictionaryError(ValueError):
@@ -177,6 +180,8 @@ def __init__(self) -> None:
177180
self.ignore_words_cased: Container[str] = frozenset()
178181

179182
def _parse_inline_ignore(self, line: str) -> Optional[FrozenSet[str]]:
183+
if _codespell_ignore_tag not in line:
184+
return frozenset()
180185
inline_ignore_match = _inline_ignore_regex.search(line)
181186
if inline_ignore_match:
182187
words = frozenset(

0 commit comments

Comments
 (0)