madara88645 · madara88645 · Mar 17, 2026 · Mar 17, 2026 · Mar 17, 2026
diff --git a/.jules/bolt.md b/.jules/bolt.md
@@ -5,3 +5,7 @@
 ## 2024-06-20 - Optimizing SQLite JSON Deserialization Cache Size
 **Learning:** In the backend RAG implementation (`app/rag/simple_index.py`), embeddings are stored as JSON strings in an SQLite database. I initially attempted to optimize repeated deserialization by adding an `lru_cache` bounded to `maxsize=1024` for parsing these strings. However, because similarity searches involve a linear scan over all database chunks, if the database has more than 1,024 chunks, the cache is completely evicted during a single scan, resulting in a 0% cache hit rate on subsequent searches (cache thrashing).
 **Action:** When caching objects that are iterated over sequentially during database table scans, ensure the cache boundary is sized large enough (e.g., `maxsize=65536`) to accommodate the entire dataset or use an unbounded cache if safe, otherwise the caching mechanism will only add overhead.
+
+## 2025-03-08 - Optimizing Heuristic Regex Searches
+**Learning:** In hot loops evaluating multiple regex patterns (e.g., categorizing user personas in `pick_persona`), iterating over raw string patterns and calling `re.search(p, text)` incurs significant overhead because Python must parse and compile the regex on every call, bypassing the limits of the internal regex cache when many distinct patterns are used.
+**Action:** When a heuristic function evaluates a large dictionary or list of constant string patterns on every invocation, pre-compile the patterns into a module-level dictionary (`_COMPILED_PERSONA_RX`) and iterate over the pre-compiled objects (`p.search(text)`). This reduces overhead significantly, avoiding repeated parsing and cache evictions. To maintain backwards compatibility when returning matched evidence, append `p.pattern` to the evidence list instead of the compiled object.
diff --git a/app/heuristics/__init__.py b/app/heuristics/__init__.py
@@ -195,15 +195,20 @@
 }
 
 
+_COMPILED_PERSONA_RX = {
+    persona: [re.compile(p) for p in pats] for persona, pats in PERSONA_KEYWORDS.items()
+}
+
+
 def pick_persona(text: str) -> tuple[str, dict]:
     lower = text.lower()
     scores = {k: 0 for k in PERSONA_KEYWORDS}
     evidence: dict[str, list[str]] = {k: [] for k in PERSONA_KEYWORDS}
-    for persona, pats in PERSONA_KEYWORDS.items():
+    for persona, pats in _COMPILED_PERSONA_RX.items():
         for p in pats:
-            if re.search(p, lower):
+            if p.search(lower):
                 scores[persona] += 1
-                evidence[persona].append(p)
+                evidence[persona].append(p.pattern)
     # choose highest score, tie -> deterministic alphabetical order of persona key
     ranked = sorted(scores.items(), key=lambda x: (-x[1], x[0]))
     if ranked and ranked[0][1] > 0: