Nexus

RAG-based code documentation assistant with grounded answers and citations.

Overview

Nexus ingests a software repository and answers developer questions about the codebase. Every answer is grounded in retrieved source evidence with verifiable citations ([file:line-range]).

Key Features:

Ingest any local code repository into a searchable index
Ask natural language questions about the codebase
Get answers with precise file and line citations woven inline
Post-hoc citation validation rejects hallucinated answers
Structured logging across the full pipeline
Refuses to answer when evidence is insufficient

Quickstart

Prerequisites

Docker and Docker Compose
(Optional) NVIDIA GPU for faster local inference

Local Development

# Clone the repository
git clone <repo-url>
cd nexus

# Copy environment config
cp .env.example .env

# Start services (Ollama + ChromaDB + Nexus)
docker compose up -d

# Pull required models (first time only)
docker compose exec ollama ollama pull gpt-oss:20b
docker compose exec ollama ollama pull nomic-embed-text

# Index a repository
docker compose exec nexus nexus ingest /path/to/repo

# Ask a question
docker compose exec nexus nexus ask "Where is authentication implemented?"

Production

# Set OpenAI API key (via CI/CD or secrets manager)
export OPENAI_API_KEY=sk-...

# Start production services
docker compose -f docker-compose.prod.yml up -d

Usage

Ingest a Repository

nexus ingest /path/to/repository --collection my-project

Ask Questions

nexus ask "Where are API endpoints defined?"
nexus ask "How does the authentication flow work?"
nexus ask "What database is used and how is it configured?"

Example Output

Searching collection: my-project...
Found 6 relevant chunks
Generating answer...

The authentication flow is handled across two modules. The login function
validates credentials and issues JWT tokens [src/auth/login.py:45-92],
while session management handles token refresh and expiration
[src/auth/session.py:12-48]. Password hashing uses bcrypt
[src/auth/crypto.py:8-25].

Citations appear inline as [path/to/file.ext:start_line-end_line], directly next to the claims they support.

Architecture

┌─────────────────────────────────────────────────────────────┐
│                      Docker Compose                         │
├─────────────┬─────────────┬─────────────┬──────────────────┤
│   Ollama    │  ChromaDB   │    Nexus    │   (Prod only)    │
│  LLM+Embed  │ Vector Store│     CLI     │   OpenAI API     │
└─────────────┴─────────────┴─────────────┴──────────────────┘

Ingestion Pipeline

Walker (ingest/walker.py) — Traverse repository, apply .nexusignore rules
Chunker (ingest/chunker.py) — Split files into chunks with line metadata
Embedder (ingest/embedder.py) — Generate embeddings via Nomic Embed Text V2
Index (ingest/index.py) — Store in ChromaDB with metadata

Query Pipeline

Search (retrieval/search.py) — Embed question, retrieve top-k chunks from ChromaDB
Context (retrieval/context_builder.py) — Build evidence block with citation headings
Answer (llm/answer.py) — Generate grounded response via LLM
Validate (llm/citation_validator.py) — Reject answer if any citation is hallucinated

Technical Decisions

Detailed rationale is documented in docs/DECISIONS.md (ADR-001 through ADR-007).

Chunking Strategy

Code files: 300 lines with 50-line overlap (line-based, preserves structure)
Text/Markdown: 1000 characters with 100-character overlap
All chunks store start_line and end_line for precise citations
Trade-off: does not respect semantic boundaries (functions, classes). AST-aware chunking deferred — see Future Improvements

Embedding Model

Nomic Embed Text V2 via Ollama — MoE architecture (475M params, 305M active), 768 dimensions
Requires instruction prefixes: search_document: for indexing, search_query: for retrieval
Omitting these prefixes significantly degrades retrieval quality
Local deployment, Apache 2.0 license, no external API dependency

Vector Store

ChromaDB in Docker — simple Python API, built-in metadata filtering
Persistent storage via Docker volumes
Trade-off: single-node limits scaling, but appropriate for current scale (thousands of chunks)

LLM

Dev: gpt-oss-20b via Ollama (local, fits in 16GB, Apache 2.0)
Prod: GPT-5.2 via OpenAI API (best-in-class grounding)
Single codebase — switches via OPENAI_BASE_URL environment variable
Temperature fixed at 0.1 for focused, deterministic answers

Retrieval

Vector-only similarity search (hybrid BM25+vectors deferred — see ADR-006)
Top-k=8 with max 2 chunks per file (diversity constraint)
L2 distance threshold of 1.5 filters irrelevant results
Over-fetches at 2x top_k to ensure enough results survive filtering

Citation Validation (Guardrails)

LLM instructed to use bracketed inline citations: [path/to/file.ext:start-end]
Post-hoc regex parsing validates every cited file path against retrieved context
Full rejection policy: if any citation references a file not in context, the entire answer is rejected — because a hallucinated citation implies the associated claim is also hallucinated (ADR-007)
File-path-only validation (no line range checking) avoids false rejections

Structured Logging

structlog with JSON and console output modes
Three logging points: search (query, results, latency), LLM call (model, latency, tokens), citation validation (pass/fail, invalid paths)
Configured via LOG_LEVEL and LOG_FORMAT environment variables

Trade-offs

Decision	Chosen	Alternative	Why
Chunking	Line-based (300 lines)	AST-aware	Simpler, language-agnostic. May split functions awkwardly.
Retrieval	Vector-only	Hybrid (BM25 + vectors)	Simpler to implement. Keyword queries (exact function names) may underperform.
Citation handling	Full rejection	Strip invalid, keep answer	Safer — hallucinated citation implies hallucinated claim. May reject otherwise useful answers.
Embedding	Local (Ollama)	Cloud API	No external dependency, consistent dev/prod. Slightly lower quality than top cloud models.
Vector store	ChromaDB	FAISS / Milvus	Simpler API, Docker-native. Fewer tuning options, single-node only.
LLM abstraction	OpenAI-compatible API	Framework (LangChain, etc.)	Minimal dependency, direct control. No built-in chains or agents.

Productionisation

To deploy Nexus on a cloud platform:

Infrastructure

Compute: Container orchestration (ECS/Fargate, Cloud Run, or Kubernetes) for the Nexus CLI/API
Vector store: Managed ChromaDB (Chroma Cloud) or migrate to a managed alternative (Pinecone, Weaviate) for durability and scaling
LLM: OpenAI API via API gateway with rate limiting and key rotation
Embeddings: Continue using Nomic Embed Text V2, either self-hosted or via cloud endpoint

Operations

CI/CD: GitHub Actions pipeline — lint, test, build Docker image, push to registry, deploy
Monitoring: Structured JSON logs piped to a log aggregator (Datadog, CloudWatch). Alert on citation_validation_failed events, high LLM latency, or elevated error rates
Scaling: Nexus is stateless (state lives in ChromaDB) — horizontal scaling is straightforward. ChromaDB is the bottleneck; managed vector store solves this
Secrets: API keys injected via secrets manager (AWS Secrets Manager, GCP Secret Manager), never committed

Reliability

Health checks: /status endpoint (or extend existing nexus status command) for liveness/readiness probes
Retry logic: Already built into the embedder (exponential backoff). Add similar retry logic for LLM calls in production
Cost control: Token usage logging (already in place) enables cost attribution and budget alerting

Evaluation

An evaluation harness measures retrieval quality against the Nexus codebase:

# Run eval (requires Ollama + ChromaDB running with indexed collection)
python -m eval.run_eval --collection ai-assessment

This runs 20 questions through the full pipeline and reports evidence hit-rate: the percentage of questions where the answer cited at least one expected source file.

Results

Model	Hit-rate	Notes
GPT-5.2 (OpenAI API)	90% (18/20)	Production target. Strong citation compliance.
llama3.1:8b (Ollama)	45% (9/20)	Inconsistent citation formatting, needs smaller context window (top_k=4).

Failure analysis (GPT-5.2):

1 retrieval miss — app/cli.py ranks low for "CLI entrypoint" because documentation files that describe the CLI are more semantically similar than the code itself. This is the known hybrid retrieval gap (vector-only search misses keyword matches).
1 citation validation rejection — the LLM cited an example file path (src/auth.py) found in a docstring within the retrieved context. The guardrail correctly rejected it.

See eval/questions.json for the question set and expected evidence files.

AI Tooling

This project was built with the assistance of Claude Code (Anthropic's CLI coding agent).

How AI tools were used

Design: Collaborative brainstorming sessions to explore approaches and trade-offs before implementation. Design documents written iteratively with human review at each section.
Implementation: Subagent-driven development — each task dispatched to a fresh agent with full context, followed by two-stage review (spec compliance, then code quality).
Testing: TDD throughout — tests written before implementation, with the agent verifying failures before writing production code.
Code review: Automated spec compliance and code quality review after each task.

Quality controls

All code reviewed by human before committing
Every function has full type hints and docstrings
167 automated tests covering all modules
ruff linting and formatting enforced
Manual verification against live services at each milestone

Future Improvements

With more time, the following would improve Nexus:

AST-aware chunking — Respect semantic boundaries (functions, classes) for popular languages. Would reduce split-function artifacts and improve retrieval precision.
Hybrid retrieval (BM25 + vectors) — Combine keyword search with semantic search. Would improve recall for exact-match queries like "where is authenticate called?".
Streaming answers — Stream LLM responses to the terminal as they're generated, rather than waiting for the complete response. Better UX for longer answers.
Web UI — FastAPI backend + simple frontend for browser-based Q&A. The pipeline is already structured for this (search → context → answer → validate).
Caching — Cache embeddings and frequent query results to reduce latency and API costs.
Multi-repo support — Index multiple repositories into separate collections with a unified query interface.
Confidence scoring — Surface the retrieval distance scores to help users gauge answer reliability.

Development

See AGENTS.md for development standards.

# Run tests
pytest

# Run linting
ruff check .

# Run evaluation
python -m eval.run_eval --collection ai-assessment

# Start services
docker compose up -d

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
app		app
docs		docs
eval		eval
ingest		ingest
llm		llm
plans		plans
prompts		prompts
retrieval		retrieval
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.nexusignore		.nexusignore
AGENTS.md		AGENTS.md
Dockerfile		Dockerfile
README.md		README.md
docker-compose.prod.yml		docker-compose.prod.yml
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml

mustdobetter/nexus

Folders and files

Latest commit

History

Repository files navigation