Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .jules/bolt.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,7 @@
## 2024-06-20 - Optimizing SQLite JSON Deserialization Cache Size
**Learning:** In the backend RAG implementation (`app/rag/simple_index.py`), embeddings are stored as JSON strings in an SQLite database. I initially attempted to optimize repeated deserialization by adding an `lru_cache` bounded to `maxsize=1024` for parsing these strings. However, because similarity searches involve a linear scan over all database chunks, if the database has more than 1,024 chunks, the cache is completely evicted during a single scan, resulting in a 0% cache hit rate on subsequent searches (cache thrashing).
**Action:** When caching objects that are iterated over sequentially during database table scans, ensure the cache boundary is sized large enough (e.g., `maxsize=65536`) to accommodate the entire dataset or use an unbounded cache if safe, otherwise the caching mechanism will only add overhead.

## 2025-03-08 - Optimizing Heuristic Regex Searches
**Learning:** In hot loops evaluating multiple regex patterns (e.g., categorizing user personas in `pick_persona`), iterating over raw string patterns and calling `re.search(p, text)` incurs significant overhead because Python must parse and compile the regex on every call, bypassing the limits of the internal regex cache when many distinct patterns are used.
**Action:** When a heuristic function evaluates a large dictionary or list of constant string patterns on every invocation, pre-compile the patterns into a module-level dictionary (`_COMPILED_PERSONA_RX`) and iterate over the pre-compiled objects (`p.search(text)`). This reduces overhead significantly, avoiding repeated parsing and cache evictions. To maintain backwards compatibility when returning matched evidence, append `p.pattern` to the evidence list instead of the compiled object.
11 changes: 8 additions & 3 deletions app/heuristics/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -195,15 +195,20 @@
}


_COMPILED_PERSONA_RX = {
persona: [re.compile(p) for p in pats] for persona, pats in PERSONA_KEYWORDS.items()
}


def pick_persona(text: str) -> tuple[str, dict]:
lower = text.lower()
scores = {k: 0 for k in PERSONA_KEYWORDS}
evidence: dict[str, list[str]] = {k: [] for k in PERSONA_KEYWORDS}
for persona, pats in PERSONA_KEYWORDS.items():
for persona, pats in _COMPILED_PERSONA_RX.items():
for p in pats:
if re.search(p, lower):
if p.search(lower):
scores[persona] += 1
evidence[persona].append(p)
evidence[persona].append(p.pattern)
# choose highest score, tie -> deterministic alphabetical order of persona key
ranked = sorted(scores.items(), key=lambda x: (-x[1], x[0]))
if ranked and ranked[0][1] > 0:
Expand Down
Loading