RAG-based code documentation assistant with grounded answers and citations.
Nexus ingests a software repository and answers developer questions about the codebase. Every answer is grounded in retrieved source evidence with verifiable citations ([file:line-range]).
Key Features:
- Ingest any local code repository into a searchable index
- Ask natural language questions about the codebase
- Get answers with precise file and line citations woven inline
- Post-hoc citation validation rejects hallucinated answers
- Structured logging across the full pipeline
- Refuses to answer when evidence is insufficient
- Docker and Docker Compose
- (Optional) NVIDIA GPU for faster local inference
# Clone the repository
git clone <repo-url>
cd nexus
# Copy environment config
cp .env.example .env
# Start services (Ollama + ChromaDB + Nexus)
docker compose up -d
# Pull required models (first time only)
docker compose exec ollama ollama pull gpt-oss:20b
docker compose exec ollama ollama pull nomic-embed-text
# Index a repository
docker compose exec nexus nexus ingest /path/to/repo
# Ask a question
docker compose exec nexus nexus ask "Where is authentication implemented?"# Set OpenAI API key (via CI/CD or secrets manager)
export OPENAI_API_KEY=sk-...
# Start production services
docker compose -f docker-compose.prod.yml up -dnexus ingest /path/to/repository --collection my-projectnexus ask "Where are API endpoints defined?"
nexus ask "How does the authentication flow work?"
nexus ask "What database is used and how is it configured?"Searching collection: my-project...
Found 6 relevant chunks
Generating answer...
The authentication flow is handled across two modules. The login function
validates credentials and issues JWT tokens [src/auth/login.py:45-92],
while session management handles token refresh and expiration
[src/auth/session.py:12-48]. Password hashing uses bcrypt
[src/auth/crypto.py:8-25].
Citations appear inline as [path/to/file.ext:start_line-end_line], directly next to the claims they support.
┌─────────────────────────────────────────────────────────────┐
│ Docker Compose │
├─────────────┬─────────────┬─────────────┬──────────────────┤
│ Ollama │ ChromaDB │ Nexus │ (Prod only) │
│ LLM+Embed │ Vector Store│ CLI │ OpenAI API │
└─────────────┴─────────────┴─────────────┴──────────────────┘
- Walker (
ingest/walker.py) — Traverse repository, apply.nexusignorerules - Chunker (
ingest/chunker.py) — Split files into chunks with line metadata - Embedder (
ingest/embedder.py) — Generate embeddings via Nomic Embed Text V2 - Index (
ingest/index.py) — Store in ChromaDB with metadata
- Search (
retrieval/search.py) — Embed question, retrieve top-k chunks from ChromaDB - Context (
retrieval/context_builder.py) — Build evidence block with citation headings - Answer (
llm/answer.py) — Generate grounded response via LLM - Validate (
llm/citation_validator.py) — Reject answer if any citation is hallucinated
Detailed rationale is documented in docs/DECISIONS.md (ADR-001 through ADR-007).
- Code files: 300 lines with 50-line overlap (line-based, preserves structure)
- Text/Markdown: 1000 characters with 100-character overlap
- All chunks store
start_lineandend_linefor precise citations - Trade-off: does not respect semantic boundaries (functions, classes). AST-aware chunking deferred — see Future Improvements
- Nomic Embed Text V2 via Ollama — MoE architecture (475M params, 305M active), 768 dimensions
- Requires instruction prefixes:
search_document:for indexing,search_query:for retrieval - Omitting these prefixes significantly degrades retrieval quality
- Local deployment, Apache 2.0 license, no external API dependency
- ChromaDB in Docker — simple Python API, built-in metadata filtering
- Persistent storage via Docker volumes
- Trade-off: single-node limits scaling, but appropriate for current scale (thousands of chunks)
- Dev: gpt-oss-20b via Ollama (local, fits in 16GB, Apache 2.0)
- Prod: GPT-5.2 via OpenAI API (best-in-class grounding)
- Single codebase — switches via
OPENAI_BASE_URLenvironment variable - Temperature fixed at 0.1 for focused, deterministic answers
- Vector-only similarity search (hybrid BM25+vectors deferred — see ADR-006)
- Top-k=8 with max 2 chunks per file (diversity constraint)
- L2 distance threshold of 1.5 filters irrelevant results
- Over-fetches at 2x top_k to ensure enough results survive filtering
- LLM instructed to use bracketed inline citations:
[path/to/file.ext:start-end] - Post-hoc regex parsing validates every cited file path against retrieved context
- Full rejection policy: if any citation references a file not in context, the entire answer is rejected — because a hallucinated citation implies the associated claim is also hallucinated (ADR-007)
- File-path-only validation (no line range checking) avoids false rejections
- structlog with JSON and console output modes
- Three logging points: search (query, results, latency), LLM call (model, latency, tokens), citation validation (pass/fail, invalid paths)
- Configured via
LOG_LEVELandLOG_FORMATenvironment variables
| Decision | Chosen | Alternative | Why |
|---|---|---|---|
| Chunking | Line-based (300 lines) | AST-aware | Simpler, language-agnostic. May split functions awkwardly. |
| Retrieval | Vector-only | Hybrid (BM25 + vectors) | Simpler to implement. Keyword queries (exact function names) may underperform. |
| Citation handling | Full rejection | Strip invalid, keep answer | Safer — hallucinated citation implies hallucinated claim. May reject otherwise useful answers. |
| Embedding | Local (Ollama) | Cloud API | No external dependency, consistent dev/prod. Slightly lower quality than top cloud models. |
| Vector store | ChromaDB | FAISS / Milvus | Simpler API, Docker-native. Fewer tuning options, single-node only. |
| LLM abstraction | OpenAI-compatible API | Framework (LangChain, etc.) | Minimal dependency, direct control. No built-in chains or agents. |
To deploy Nexus on a cloud platform:
- Compute: Container orchestration (ECS/Fargate, Cloud Run, or Kubernetes) for the Nexus CLI/API
- Vector store: Managed ChromaDB (Chroma Cloud) or migrate to a managed alternative (Pinecone, Weaviate) for durability and scaling
- LLM: OpenAI API via API gateway with rate limiting and key rotation
- Embeddings: Continue using Nomic Embed Text V2, either self-hosted or via cloud endpoint
- CI/CD: GitHub Actions pipeline — lint, test, build Docker image, push to registry, deploy
- Monitoring: Structured JSON logs piped to a log aggregator (Datadog, CloudWatch). Alert on
citation_validation_failedevents, high LLM latency, or elevated error rates - Scaling: Nexus is stateless (state lives in ChromaDB) — horizontal scaling is straightforward. ChromaDB is the bottleneck; managed vector store solves this
- Secrets: API keys injected via secrets manager (AWS Secrets Manager, GCP Secret Manager), never committed
- Health checks:
/statusendpoint (or extend existingnexus statuscommand) for liveness/readiness probes - Retry logic: Already built into the embedder (exponential backoff). Add similar retry logic for LLM calls in production
- Cost control: Token usage logging (already in place) enables cost attribution and budget alerting
An evaluation harness measures retrieval quality against the Nexus codebase:
# Run eval (requires Ollama + ChromaDB running with indexed collection)
python -m eval.run_eval --collection ai-assessmentThis runs 20 questions through the full pipeline and reports evidence hit-rate: the percentage of questions where the answer cited at least one expected source file.
| Model | Hit-rate | Notes |
|---|---|---|
| GPT-5.2 (OpenAI API) | 90% (18/20) | Production target. Strong citation compliance. |
| llama3.1:8b (Ollama) | 45% (9/20) | Inconsistent citation formatting, needs smaller context window (top_k=4). |
Failure analysis (GPT-5.2):
- 1 retrieval miss —
app/cli.pyranks low for "CLI entrypoint" because documentation files that describe the CLI are more semantically similar than the code itself. This is the known hybrid retrieval gap (vector-only search misses keyword matches). - 1 citation validation rejection — the LLM cited an example file path (
src/auth.py) found in a docstring within the retrieved context. The guardrail correctly rejected it.
See eval/questions.json for the question set and expected evidence files.
This project was built with the assistance of Claude Code (Anthropic's CLI coding agent).
- Design: Collaborative brainstorming sessions to explore approaches and trade-offs before implementation. Design documents written iteratively with human review at each section.
- Implementation: Subagent-driven development — each task dispatched to a fresh agent with full context, followed by two-stage review (spec compliance, then code quality).
- Testing: TDD throughout — tests written before implementation, with the agent verifying failures before writing production code.
- Code review: Automated spec compliance and code quality review after each task.
- All code reviewed by human before committing
- Every function has full type hints and docstrings
- 167 automated tests covering all modules
- ruff linting and formatting enforced
- Manual verification against live services at each milestone
With more time, the following would improve Nexus:
- AST-aware chunking — Respect semantic boundaries (functions, classes) for popular languages. Would reduce split-function artifacts and improve retrieval precision.
- Hybrid retrieval (BM25 + vectors) — Combine keyword search with semantic search. Would improve recall for exact-match queries like "where is
authenticatecalled?". - Streaming answers — Stream LLM responses to the terminal as they're generated, rather than waiting for the complete response. Better UX for longer answers.
- Web UI — FastAPI backend + simple frontend for browser-based Q&A. The pipeline is already structured for this (search → context → answer → validate).
- Caching — Cache embeddings and frequent query results to reduce latency and API costs.
- Multi-repo support — Index multiple repositories into separate collections with a unified query interface.
- Confidence scoring — Surface the retrieval distance scores to help users gauge answer reliability.
See AGENTS.md for development standards.
# Run tests
pytest
# Run linting
ruff check .
# Run evaluation
python -m eval.run_eval --collection ai-assessment
# Start services
docker compose up -dMIT