autodocs

Automated documentation drift detection using LLM-powered analysis + deterministic verification.

When your team merges PRs that change code described in your documentation, autodocs detects which doc sections are stale, generates suggested updates, and maintains a changelog of why things changed.

Supports GitHub, GitLab, Bitbucket, and Azure DevOps. Runs via Claude Code CLI or Anthropic API.

How it works

Your repo (git)         GitHub / ADO / GitLab / Bitbucket    LLM (Claude)
     |                       |                                    |
     |   git diff-tree       |   merged PRs (+ descriptions,      |   drift detection
     |   (change types,      |      review threads, diffs)        |   + suggestions
     |    A/M/D/R + diffs)   |                                    |
     v                       v                                    v
  +---------------------------------------------------------+
  |                    autodocs pipeline                    |
  |                                                         |
  |  Step 1: Sync (deterministic Python)                    |
  |  - Discover relevant PRs via git log + path filter      |
  |  - Fetch PR details from platform API (relevant only)   |
  |  - Get changed files via git diff-tree (local repo)     |
  |  - Fetch PR review threads for relevant PRs             |
  |  - Write: daily-report.md, activity-log.md              |
  |                                                         |
  |  Step 2: Drift Detection (LLM)                          |
  |  - Read sync output + your doc's file index             |
  |  - Map changed packages to doc sections                 |
  |  - Flag new error patterns not in known list            |
  |  - Detect unmapped files (new code not in docs)         |
  |  - Write: drift-report.md, drift-status.md, drift-log   |
  |                                                         |
  |  Step 3: Suggest + Changelog (LLM, if drift found)      |
  |  - Read flagged sections from your actual docs          |
  |  - Generate FIND/REPLACE edit suggestions (verified)    |
  |  - Write changelog entries capturing WHY things changed |
  |  - Write: drift-suggestions.md, changelog-<doc>.md      |
  |                                                         |
  |  Step 3v: Verify (optional, multi-model verification)   |
  |  - Re-run suggest with variant reasoning path           |
  |  - Write: drift-suggestions-verify.md                   |
  |                                                         |
  |  Step 4: Apply as PR (deterministic Python)             |
  |  - Apply FIND/REPLACE to doc files (verified only)      |
  |  - Create branch, commit changes + changelog            |
  |  - Open pull request via platform CLI                   |
  |                                                         |
  |  Weekly: Structural Scan (Saturday)                     |
  |  - Verify every file referenced in docs still exists    |
  |  - Find undocumented files in feature paths             |
  |  - Write: structural-report.md                          |
  +---------------------------------------------------------+

The LLM is only used for steps 2 and 3 (drift detection and suggestion generation) — tasks that genuinely need natural language understanding. Everything else is deterministic Python: PR fetching, classification, verification, edit application, git operations, and PR creation.

Key innovations

git diff-tree for file-level precision. The ADO MCP doesn't expose changed files for a PR. autodocs extracts the merge commit SHA from the PR response, then runs git diff-tree against the local repo. This enables precise, package-to-section drift detection.

FIND/REPLACE with self-verification. Suggestions use a structured FIND/REPLACE format where each FIND block is verified against the actual doc (with line numbers). Optionally, autodocs can apply verified suggestions automatically by opening a PR in ADO — the human reviews and merges through the standard code review workflow.

Changelog with "why." Most documentation tells you WHAT the system does today. autodocs also captures WHY things changed — from PR descriptions and review threads. Six months later, when nobody remembers why handleError became classifyError, the changelog does.

What it produces

Daily output

daily-report.md — Structured daily summary with YAML frontmatter:

Team PRs with descriptions, file lists, and review thread summaries
Owner's activity (reviews, authored PRs)
Telemetry metrics (reliability, error breakdown, anomalies)

drift-report.md — Documentation drift alerts:

Which doc sections may be stale, triggered by which PRs
Confidence levels (CRITICAL / HIGH / LOW)
New error patterns not in your known patterns list

drift-suggestions.md — FIND/REPLACE edit suggestions:

Structured FIND/REPLACE or FIND/INSERT AFTER format
Each FIND block self-verified against the doc (with line numbers)
Rated CONFIDENT (clear factual change) or REVIEW (needs human judgment)
Frontmatter includes verified: X/Y count

changelog-<doc>.md — Per-doc change history, organized by section:

What changed, why it changed (from PR descriptions)
Reviewer context (from PR review threads)
Permanent record — never trimmed
Committed to the repo alongside doc edits (via auto-PR)

drift-status.md — Active alert tracker (Obsidian-compatible checkboxes):

Check off alerts you've resolved
LOW alerts auto-expire after 7 days
Deduplicates across days

Weekly output

structural-report.md — Documentation structural audit:

Files referenced in docs that no longer exist in the repo
Files in feature paths that aren't documented
Catches drift that predates autodocs

Quick start

Prerequisites

Git repo checked out locally
Python 3.9+
LLM backend (one of):

Backend	Tool	Setup
Claude Code CLI (default)	Claude Code	Install and run `claude` to authenticate
Anthropic API	`pip install anthropic`	Set `ANTHROPIC_API_KEY` environment variable

Platform CLI:

Platform	Tool	Setup
GitHub	`gh`	Install and run `gh auth login`
GitLab	`glab`	Install and run `glab auth login`
Bitbucket	`curl`	Set `BITBUCKET_TOKEN` environment variable
Azure DevOps	`az`	Install and run `az login`

(Optional) Kusto MCP for telemetry monitoring (requires Claude Code CLI backend)

Setup

Starting from scratch (no existing docs):

git clone https://github.com/msiric/autodocs.git
cd autodocs
./setup.sh generate /path/to/your-repo /path/to/output-dir --relevant-dirs src/

This generates an architecture doc from your codebase and a matching config.yaml. Review the generated doc, commit it to your repo, then proceed to the wizard:

Interactive wizard (or if you already have docs):

./setup.sh                  # interactive setup
./setup.sh --quick          # auto-detect everything

The setup wizard will:

Detect your platform, repo details, and team members
Select or generate a doc to track
Generate a config with package_map
Render all prompts, wrapper scripts, and schedules

Configure

Edit the generated config.yaml in your output directory:

# Add your team members
team_members:
  - name: "Alice Engineer"
    ado_id: "abc123-..."

# Add paths that indicate relevance to your feature
relevant_paths:
  - packages/components/your-feature/
  - packages/hooks/your-feature-hooks/

# Add package-to-section mappings for your docs
docs:
  - name: "your-feature-guide.md"
    package_map:
      your-feature: "Architecture"
      your-feature-hooks: "Hooks Reference"

# LLM backend (optional — defaults to Claude Code CLI)
# llm:
#   backend: "api"   # Use Anthropic API instead of Claude Code CLI

Run

# Manual run (full daily chain: sync → drift → suggest → apply PR)
autodocs-now

# Catchup: process historical PRs for brownfield projects
# Walks through weekly chunks, builds changelog, then creates one fix PR
autodocs-sync.sh --since 2025-09-01

# Dry-run catchup (shows chunk count and estimated time, no LLM calls)
autodocs-sync.sh --since 2025-09-01 --dry-run

# Structural scan (usually weekly, can run manually)
autodocs-structural-scan.sh

Schedule

# Daily sync + drift + suggest + apply (macOS)
launchctl load ~/Library/LaunchAgents/com.autodocs.sync.plist

# Weekly structural scan (macOS)
launchctl load ~/Library/LaunchAgents/com.autodocs.structural-scan.plist

Webhook (real-time mode)

Instead of scheduled runs, autodocs can process PRs immediately on merge via webhooks:

# Start the webhook server
pip install fastapi uvicorn
AUTODOCS_WEBHOOK_SECRET=your-secret OUTPUT_DIR=.autodocs REPO_DIR=. \
  uvicorn scripts.webhook_server:app --port 8080

Then configure your platform to send PR merge webhooks to http://your-host:8080/webhook/github (or /gitlab, /bitbucket).

Configuration reference

See docs/configuration.md for the complete config file reference.

Architecture

See docs/architecture.md for the data flow, security model, and design decisions.

Demo

See autodocs-demo for a live example — a Node.js API repo with an architecture doc tracked by autodocs. The latest autodocs PR shows real pipeline output: auto-applied suggestions, verification data, and changelog entries.

How it works in detail

Drift detection

The daily sync writes daily-report.md with PR data including changed file paths
The drift prompt reads the PR file lists and your doc's structure
For each changed file, it extracts the package and looks up your package_map
If the package maps to a doc section → HIGH confidence alert
If the package is in your feature's paths but NOT in the mapping → CRITICAL alert (docs may be missing coverage)
If file paths aren't available → LOW alert (manual review needed)
New telemetry error patterns not in your known list → HIGH alert

Alerts are grouped by section (not per-file), deduplicated across days, and auto-expire if unresolved.

Suggested updates + verification + auto-PR

When drift is detected, autodocs reads the flagged doc section, the PR diffs, and the current source files (ground truth), then generates suggestions that go through a multi-layer verification pipeline:

LLM generates FIND/REPLACE — with self-verification (FIND text confirmed verbatim in doc)
Deterministic FIND verification (Python) — mechanically confirms every FIND block exists in the target doc file
Deterministic REPLACE verification (Python) — extracts code references from REPLACE text (backtick identifiers, quoted literals, file paths, error codes) and verifies them against source code:
- EVIDENCED — value found in source → eligible for auto-apply
- MISMATCH — value contradicts source → blocked (prevents wrong edits)
- UNVERIFIED — value can't be checked (behavioral claim) → flagged for human review
Auto-PR — only CONFIDENT + FIND-verified + REPLACE-verified suggestions are applied. Everything else goes in the PR description for manual review.

Each suggestion also includes:

Confidence rating — CONFIDENT for clear factual changes, REVIEW for ambiguous ones
Changelog entry — what changed, why (from PR description), and reviewer context
Source file context — the LLM reads actual source files, not just diffs, preventing stale changelog entries from poisoning suggestions

If auto_pr is enabled in config, autodocs automatically:

Applies verified suggestions to doc files in the repo
Includes the changelog alongside the edits
Creates a branch and opens a PR with the autodocs label
The human reviews and merges — standard code review workflow

Structural scan

Once a week, autodocs audits your docs against the actual repo:

Every file path mentioned in a doc → verified via git ls-files
Every file under feature-relevant paths → checked if documented
Missing files (deleted/renamed) and undocumented files are reported

Design principles

Suggest, never force. autodocs generates verified FIND/REPLACE suggestions and optionally opens PRs — but a human always reviews before changes reach the main branch.
Deterministic classification. PR relevance is determined by file path matching, not LLM inference.
Graceful degradation. If ADO fails, telemetry still runs (and vice versa). If file paths aren't available, LOW alerts are generated instead of silence.
Independent call isolation. Sync, drift, suggest, verify, and apply run as independent Claude Code calls. Each can fail without corrupting the others.
Config-driven. Team members, paths, queries, and mappings live in config. Prompts are generic templates.

Security

Minimal ADO access. Calls 1-3 use 5 read-only ADO tools. Call 4 adds repo_create_pull_request and repo_create_branch — the only write operations, and only to create PRs on feature branches (never direct writes to the target branch).
No PII in output. Prompts explicitly prohibit including internal URLs, stack traces, or user identifiers.
Kusto queries are predefined. The LLM copies queries verbatim from config — it never generates KQL.
Write sandbox. Each prompt can only write to its specific output files. The apply prompt can additionally write to doc files in the repo (gated by auto_pr config).
Git operations scoped. Read operations: git diff-tree, git ls-files, git fetch. Write operations (Call 4 only): git checkout -b, git add, git commit, git push — always to a feature branch, never to the target branch.
Three-layer verification. (1) FIND text confirmed verbatim in doc by Python. (2) REPLACE text values verified against source code by Python — mismatched values are blocked. (3) Source files included in LLM context as ground truth. Unverified or mismatched suggestions are flagged for human review, never auto-applied.
Prompt injection mitigation. PR descriptions and review comments are marked as untrusted user data in all prompts.
Human reviews all changes. Auto-PRs go through standard code review workflow. Branch protection rules apply.

Best practices for optimal results

Keep PRs focused. autodocs produces the most precise suggestions for PRs with <30 changed files. Large feature PRs (50+ files) still trigger drift detection for all affected sections, but suggestions may be less precise for files beyond the diff budget (150 lines per PR).
Use meaningful PR titles and descriptions. autodocs uses PR titles for title_hints matching and descriptions for the "why" in changelogs. Vague titles like "fix stuff" degrade suggestion quality.
Structure your docs with section headers. ## Section Name headers enable per-section drift detection. Flat prose without headers gets treated as one block — suggestions are less targeted.
Keep your package_map current. When you add new packages to the repo, add them to the config. The structural scan (weekly) flags undocumented files, but the mapping determines suggestion quality.

Multi-repo / Microservices

For teams with multiple repositories, run a separate autodocs instance for each repo. Each instance monitors its own docs and PRs independently.

Cross-repo drift (API change in repo A making docs in repo B stale) is not detected automatically. For critical cross-repo dependencies, consider:

Shared documentation in a dedicated docs repo with its own autodocs instance
Manual review triggers when API contracts change
Using the weekly structural scan to verify cross-referenced file paths

Known limitations

Large PRs (50+ files): Drift detection covers all files, but the diff budget (150 lines per PR) means only the most relevant files get detailed diffs. Suggestions for undiffed files are REVIEW confidence (not CONFIDENT). The daily report notes "Diff truncated" when this happens.
Cross-repo drift: Changes in one repo that affect docs in another repo are not detected. See "Multi-repo" section above.
Non-markdown docs: Only .md files are supported. RST, HTML, AsciiDoc, and wiki-based docs are not handled.
Flat prose without headers: Docs without ## section headers are treated as a single "Main" section. Suggestions are less targeted. Consider adding headers for better drift detection.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
.github/workflows		.github/workflows
docs		docs
scripts		scripts
templates		templates
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
ROADMAP.md		ROADMAP.md
config.example.yaml		config.example.yaml
pyproject.toml		pyproject.toml
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

autodocs

How it works

Key innovations

What it produces

Daily output

Weekly output

Quick start

Prerequisites

Setup

Configure

Run

Schedule

Webhook (real-time mode)

Configuration reference

Architecture

Demo

How it works in detail

Drift detection

Suggested updates + verification + auto-PR

Structural scan

Design principles

Security

Best practices for optimal results

Multi-repo / Microservices

Known limitations

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

autodocs

How it works

Key innovations

What it produces

Daily output

Weekly output

Quick start

Prerequisites

Setup

Configure

Run

Schedule

Webhook (real-time mode)

Configuration reference

Architecture

Demo

How it works in detail

Drift detection

Suggested updates + verification + auto-PR

Structural scan

Design principles

Security

Best practices for optimal results

Multi-repo / Microservices

Known limitations

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages