🎯 Goal
Provide LLM with official documentation context via RAG to improve accuracy and reduce false positives.
📊 Complexity
Long Term (3-5 days)
🔍 Problem
The LLM doesn't have access to:
- GitHub Actions best practices documentation
- Python security guidelines (OWASP)
- Language-specific conventions
- Framework-specific patterns
This leads to incorrect assumptions and false positives.
✅ Solution
Architecture
┌─────────────────┐
│ Review Request │
└────────┬────────┘
│
▼
┌─────────────────┐ ┌──────────────────┐
│ Query Analyzer │────▶│ RAG Document DB │
│ (extract topic) │ │ (LanceDB) │
└────────┬────────┘ └──────────────────┘
│ │
│ ┌────────▼─────────┐
│ │ Relevant Docs: │
│ │ - GH Actions │
│ │ - OWASP Python │
└──────────────▶ - Error Handling │
└────────┬─────────┘
│
┌────────▼─────────┐
│ LLM Review │
│ + Context │
└──────────────────┘
Implementation
1. Document Indexing
# iara/memory/docs_indexer.py
OFFICIAL_DOCS = {
"github_actions": "https://docs.github.com/en/actions/security-guides/security-hardening-for-github-actions",
"python_security": "https://owasp.org/www-project-top-ten/",
"python_error_handling": "https://docs.python.org/3/tutorial/errors.html",
# ... more sources
}
def index_official_docs():
"""Download and index official documentation."""
for name, url in OFFICIAL_DOCS.items():
content = fetch_and_parse(url)
chunks = chunk_document(content)
store_in_lancedb(name, chunks)
2. Context Retrieval
# iara/reviewer.py
def review_code(diff, api_key, config):
# Extract topics from diff
topics = extract_topics(diff) # e.g., ["github_actions", "secrets"]
# Retrieve relevant docs
context_docs = retrieve_docs(topics, top_k=3)
# Add to system prompt
system_prompt = generate_system_prompt(config, context_docs)
# Review with enhanced context
return review_code_with_model(diff, api_key, model, system_prompt, provider)
3. Topic Extraction
def extract_topics(diff):
"""Extract topics from diff for targeted doc retrieval."""
topics = set()
if ".github/workflows" in diff:
topics.add("github_actions")
if "os.chmod" in diff or "secrets" in diff:
topics.add("security")
if "try:" in diff and "except" in diff:
topics.add("error_handling")
return list(topics)
📝 Implementation Steps
- Create
iara/memory/docs_indexer.py
- Define curated list of official documentation sources
- Implement document fetching and chunking
- Index docs into LanceDB (reuse existing RAG infrastructure)
- Add topic extraction from diffs
- Implement context retrieval in
reviewer.py
- Update system prompt to include doc context
- Add caching to avoid re-indexing
- Test with known false positive cases
- Measure improvement in accuracy
🎁 Expected Impact
- 60-80% reduction in false positives
- LLM decisions backed by official sources
- More authoritative and trustworthy reviews
- Can cite specific documentation
⚠️ Challenges
- Keeping docs up to date
- Balancing context size vs. relevance
- Initial indexing time
- Storage requirements
🔗 Related
Long term due to infrastructure requirements and doc curation effort.
🎯 Goal
Provide LLM with official documentation context via RAG to improve accuracy and reduce false positives.
📊 Complexity
Long Term (3-5 days)
🔍 Problem
The LLM doesn't have access to:
This leads to incorrect assumptions and false positives.
✅ Solution
Architecture
Implementation
1. Document Indexing
2. Context Retrieval
3. Topic Extraction
📝 Implementation Steps
iara/memory/docs_indexer.pyreviewer.py🎁 Expected Impact
🔗 Related
Long term due to infrastructure requirements and doc curation effort.