Skip to content

Add semantic caching, latency logging, and paraphrased benchmarks to TokenSmith#51

Open
sukriti112 wants to merge 1 commit intogeorgia-tech-db:mainfrom
sukriti112:sukriti-semantic-cache
Open

Add semantic caching, latency logging, and paraphrased benchmarks to TokenSmith#51
sukriti112 wants to merge 1 commit intogeorgia-tech-db:mainfrom
sukriti112:sukriti-semantic-cache

Conversation

@sukriti112
Copy link
Copy Markdown

@sukriti112 sukriti112 commented Nov 19, 2025

Summary

This PR adds a semantic answer cache and fine-grained latency instrumentation to the TokenSmith RAG pipeline, plus a paraphrased benchmark suite to evaluate cache behavior. The goal is to reduce end-to-end latency for repeated/paraphrased questions while preserving answer traceability via citations and logs.

Key Changes

  1. Semantic Answer Cache

    • Introduces an in-memory _SEMANTIC_CACHE keyed by a configuration signature (model path, retrieval knobs, index prefix, etc.).
    • Each cache entry stores:
      • Normalized question text
      • Unit-normalized question embedding (using the FAISS embedder)
      • Final answer string
      • Chunk indices and chunk metadata
      • HyDE text used for retrieval (when enabled)
    • On each query:
      • Embed the incoming question and compute cosine similarity against cached embeddings.
      • If max similarity ≥ 0.85, treat as a semantic cache hit, reuse the answer, and log semantic_cache_hit_seconds.
      • Otherwise, run the normal HyDE → retrieval → ranking → generation pipeline and insert a new cache entry.
    • Cache size is capped (default 50 entries per config) to avoid unbounded growth.
  2. Per-Stage Latency Instrumentation

    • Extends stage_timings with:
      • hyde_seconds
      • retrieval_seconds
      • ranking_seconds
      • generation_seconds
      • semantic_cache_hit_seconds
    • Each query log entry now shows these timings, enabling:
      • Cold vs. cached comparison per query.
      • Identification of the main latency bottlenecks.
  3. Paraphrased Benchmark Suite

    • Adds tests/benchmarks_semantic.yaml with 12 paraphrased versions of the existing benchmark questions.
    • These benchmarks are used only to test semantic-cache behavior and do not change base accuracy evaluation.

How to Run

python scripts/run_benchmarks.py --benchmarks tests/benchmarks.yaml tests/benchmarks_semantic.yaml --runs 1

@shahmeer99 shahmeer99 added the wontfix This will not be worked on label Mar 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

wontfix This will not be worked on

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants