Skip to content

Conversation

@ckrough
Copy link
Owner

@ckrough ckrough commented Jan 10, 2026

Summary

  • Enable Anthropic prompt caching via OpenRouter for cost and latency optimization
  • Add llm_enable_prompt_caching config option (default: true)
  • Log cache hit/write metrics for observability

Changes

  • src/config.py: Add llm_enable_prompt_caching setting
  • src/infrastructure/llm/openrouter.py: Add cache_control to system messages, log cache metrics
  • src/web/routes.py: Pass caching config to provider
  • .env.example: Document new config option
  • tests/test_llm_provider.py: Add comprehensive tests for caching behavior

Benefits

  • Up to 90% reduction in input token costs for cached prompts
  • Up to 85% latency reduction for repeated system prompts
  • Observability via structured cache metrics logging

Test plan

  • All existing tests pass (330 passed)
  • Coverage maintained at 80.61%
  • mypy strict mode passes
  • ruff format/check passes
  • bandit security scan passes

Closes: retriever-ghs

Enable prompt caching via OpenRouter for Anthropic models (Claude Sonnet/Haiku).
This can reduce input token costs by up to 90% and latency by up to 85% for
repeated system prompts.

Changes:
- Add llm_enable_prompt_caching config option (default: true)
- Add cache_control to system messages in OpenRouterProvider
- Log cache hit/write metrics for observability
- Update .env.example with new config option

Closes: retriever-ghs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants