Cache the construction of "BadType" in case when pytype sees a Bad Ma…

…tch of types. When a bad match of type happens, it doesn't necessarily mean that there would be a type error. It means that out of many possible signature matches (or type matches) between different types including generics, there is something that doesn't match and we cannot use that result. This result will be used later when producing dignostics to indicate which type mismatches, in the case where there is no single type match out of those combinations. The problem is that when the combinations (e.g. multiple signatures, generics, recursive types) that needs to be verified need to be type checked, it tries out so many different combinations of types, and generate tons of the same "BadMatch" which is just redundant. Making it even worse, it's not only the computation but also the memory pressure caused by this because it seems to construct something which is memory intensive, thus GC frequently showing up in profile in these particular cases. There is still some risk that this might increase the peak memory consumption because once put in the cache, the object will stay until the lifetime of the AbstractMatcher which I think the lifetime is same as the type checker, but the drastic performance improvement in these particular cases seem worth it. A better solution might be actually making the type checker not call into generating these seemingly redundant type checks on these crazy combinations, but at a glance this seemed like a fundamental design issue that lives in the complex nature of the type graph, and without being able to understand / changing it this seems to be the shortest path without hurting code health too much. PiperOrigin-RevId: 701062879
google · Nov 29, 2024 · 85515d1 · 85515d1
1 parent e05ad68
commit 85515d1
Showing 1 changed file with 16 additions and 1 deletion.
diff --git a/pytype/matcher.py b/pytype/matcher.py
@@ -202,6 +202,16 @@ def __init__(self, node, ctx):
     self._error_subst = None
     self._type_params = _TypeParams()
     self._paramspecs = {}
+    # Constructing Individual bad match is costly, due to calling multiple of
+    # chained constructors and generating complex data structure. And in some
+    # corner cases which includes recursive generic types with overloads, it
+    # causes massive call to construction of bad match.
+    # A better solution might be not to make those seemingly redundant request
+    # from the type checker, but for now this is a comprimise to gain
+    # performance in those weird corner cases.
+    self._bad_match_cache: dict[
+        tuple[cfg.CFGNode, types.BaseValue], error_types.BadType
+    ] = {}
 
     compat_items = pep484.get_compat_items(
         none_matches_bool=not ctx.options.none_is_not_bool
@@ -245,13 +255,18 @@ def _error_details(self):
   def _get_bad_type(
       self, name: str | None, expected: types.BaseValue
   ) -> error_types.BadType:
-    return error_types.BadType(
+    res = self._bad_match_cache.get((self._node, expected), None)
+    if res:
+      return res
+    res = error_types.BadType(
         name=name,
         typ=self.ctx.annotation_utils.sub_one_annotation(
             self._node, expected, [self._error_subst or {}]
         ),
         error_details=self._error_details(),
     )
+    self._bad_match_cache[(self._node, expected)] = res
+    return res
 
   # TODO(b/63407497): We were previously enforcing --strict_parameter_checks
   # in compute_one_match, which didn't play nicely with overloads. Instead,