Use RAG to provide best practices context from official docs

## 🎯 Goal
Provide LLM with official documentation context via RAG to improve accuracy and reduce false positives.

## 📊 Complexity
**Long Term** (3-5 days)

## 🔍 Problem
The LLM doesn't have access to:
- GitHub Actions best practices documentation
- Python security guidelines (OWASP)
- Language-specific conventions
- Framework-specific patterns

This leads to incorrect assumptions and false positives.

## ✅ Solution

### Architecture
```
┌─────────────────┐
│  Review Request │
└────────┬────────┘
         │
         ▼
┌─────────────────┐     ┌──────────────────┐
│ Query Analyzer  │────▶│ RAG Document DB  │
│ (extract topic) │     │ (LanceDB)        │
└────────┬────────┘     └──────────────────┘
         │                       │
         │              ┌────────▼─────────┐
         │              │ Relevant Docs:   │
         │              │ - GH Actions     │
         │              │ - OWASP Python   │
         └──────────────▶ - Error Handling │
                        └────────┬─────────┘
                                 │
                        ┌────────▼─────────┐
                        │  LLM Review      │
                        │  + Context       │
                        └──────────────────┘
```

### Implementation

#### 1. Document Indexing
```python
# iara/memory/docs_indexer.py

OFFICIAL_DOCS = {
    "github_actions": "https://docs.github.com/en/actions/security-guides/security-hardening-for-github-actions",
    "python_security": "https://owasp.org/www-project-top-ten/",
    "python_error_handling": "https://docs.python.org/3/tutorial/errors.html",
    # ... more sources
}

def index_official_docs():
    """Download and index official documentation."""
    for name, url in OFFICIAL_DOCS.items():
        content = fetch_and_parse(url)
        chunks = chunk_document(content)
        store_in_lancedb(name, chunks)
```

#### 2. Context Retrieval
```python
# iara/reviewer.py

def review_code(diff, api_key, config):
    # Extract topics from diff
    topics = extract_topics(diff)  # e.g., ["github_actions", "secrets"]
    
    # Retrieve relevant docs
    context_docs = retrieve_docs(topics, top_k=3)
    
    # Add to system prompt
    system_prompt = generate_system_prompt(config, context_docs)
    
    # Review with enhanced context
    return review_code_with_model(diff, api_key, model, system_prompt, provider)
```

#### 3. Topic Extraction
```python
def extract_topics(diff):
    """Extract topics from diff for targeted doc retrieval."""
    topics = set()
    
    if ".github/workflows" in diff:
        topics.add("github_actions")
    if "os.chmod" in diff or "secrets" in diff:
        topics.add("security")
    if "try:" in diff and "except" in diff:
        topics.add("error_handling")
    
    return list(topics)
```

## 📝 Implementation Steps
1. Create `iara/memory/docs_indexer.py`
2. Define curated list of official documentation sources
3. Implement document fetching and chunking
4. Index docs into LanceDB (reuse existing RAG infrastructure)
5. Add topic extraction from diffs
6. Implement context retrieval in `reviewer.py`
7. Update system prompt to include doc context
8. Add caching to avoid re-indexing
9. Test with known false positive cases
10. Measure improvement in accuracy

## 🎁 Expected Impact
- 60-80% reduction in false positives
- LLM decisions backed by official sources
- More authoritative and trustworthy reviews
- Can cite specific documentation

## ⚠️ Challenges
- Keeping docs up to date
- Balancing context size vs. relevance
- Initial indexing time
- Storage requirements

## 🔗 Related
- Existing RAG infrastructure (Issue #45)
- Issue #70 (System Prompt)
- Issue #72 (Confidence Score)

---
Long term due to infrastructure requirements and doc curation effort.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use RAG to provide best practices context from official docs #73

🎯 Goal

📊 Complexity

🔍 Problem

✅ Solution

Architecture

Implementation

1. Document Indexing

2. Context Retrieval

3. Topic Extraction

📝 Implementation Steps

🎁 Expected Impact

⚠️ Challenges

🔗 Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Use RAG to provide best practices context from official docs #73

Description

🎯 Goal

📊 Complexity

🔍 Problem

✅ Solution

Architecture

Implementation

1. Document Indexing

2. Context Retrieval

3. Topic Extraction

📝 Implementation Steps

🎁 Expected Impact

⚠️ Challenges

🔗 Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions