madara88645 · madara88645 · Mar 18, 2026 · Mar 18, 2026 · Mar 18, 2026
diff --git a/.jules/bolt.md b/.jules/bolt.md
@@ -11,5 +11,5 @@
 **Action:** When calculating similarity scores or dot products on vectors represented as Python lists, always prefer `map(operator.mul, a, b)` wrapped in `sum()` over list comprehensions or generator expressions.
 
 ## 2024-08-14 - Optimizing Sparse Dictionary Intersections in Math Hot Loops
-**Learning:** When computing cosine similarity or dot products for sparse dictionaries (like TF-IDF score mappings) in Python, creating sets for key intersection (`set(v1.keys()) & set(v2.keys())`) adds significant overhead due to set allocation and hashing. Iterating directly over the items of the smaller dictionary and checking for key existence in the larger dictionary (`sum(v * v2[k] for k, v in v1.items() if k in v2)`) operates in O(min(N, M)) time with fewer memory allocations and is roughly 30-40% faster in execution time.
-**Action:** Replace `set()` intersection calls with smaller-dictionary iteration logic (`if len(v1) > len(v2): v1, v2 = v2, v1`) when calculating intersections of sparse dicts in tight performance paths.
+**Learning:** When computing cosine similarity or dot products for sparse dictionaries (like TF-IDF score mappings) in Python, creating sets for key intersection (`set(v1.keys()) & set(v2.keys())`) adds significant overhead due to set allocation and hashing. Iterating directly over the items of the smaller dictionary with a single lookup into the larger dictionary (`val = v2.get(k, sentinel)`) avoids double hashing, keeps O(min(N, M)) complexity, and is roughly 30-40% faster in execution time while still handling `0.0` values correctly.
+**Action:** Replace `set()` intersection calls with smaller-dictionary iteration logic (`if len(v1) > len(v2): v1, v2 = v2, v1`) and use a sentinel-backed `dict.get` to keep one lookup per key: `sentinel = object(); dot = sum(v * val for k, v in v1.items() if (val := v2.get(k, sentinel)) is not sentinel)` in tight performance paths.
diff --git a/app/rag/simple_index.py b/app/rag/simple_index.py
@@ -278,10 +278,12 @@ def cosine_similarity(v1: Dict[str, float], v2: Dict[str, float]) -> float:
         if len(v1) > len(v2):
             v1, v2 = v2, v1
 
+        missing = object()
         dot = 0.0
         for k, v in v1.items():
-            if k in v2:
-                dot += v * v2[k]
+            v2_val = v2.get(k, missing)
+            if v2_val is not missing:
+                dot += v * v2_val
 
         # Bolt Optimization: math.hypot is ~5x faster than math.sqrt(sum(v*v))
         norm1 = math.hypot(*v1.values())