Generate deterministic, token-efficient maps and review bundles for Python repositories.
anatomize has two complementary workflows:
- Skeletons: structure-only “code maps” for navigation and architecture understanding.
- Packs: single-file bundles (repomix-style) for external review, with filtering and slicing.
If you want the full guide (modes, slicing, config, determinism guarantees), see docs/GUIDE.md.
pip install anatomize# Scaffold config for the common workflow “src detailed, tests minimal”
anatomize init --preset standard
# Generate all configured outputs from .anatomize.yaml (writes into .anatomy/*)
anatomize generate
# Or run ad-hoc generation for a specific source path
anatomize generate ./src
# Choose resolution level
anatomize generate ./src --level hierarchy --output .anatomy
anatomize generate ./src --level modules --output .anatomy
anatomize generate ./src --level signatures --output .anatomy
# Write multiple formats
anatomize generate ./src --format yaml --format json --format markdown --output .anatomyanatomize estimate ./src --level modules# Validate all configured outputs (from .anatomize.yaml)
anatomize validate
# Rewrite configured outputs to match regenerated content (strict, atomic-ish replacement)
anatomize validate --fix
# Or validate a specific directory against explicit sources
anatomize validate .anatomy/src --source ./src# If --format is omitted, it is inferred from --output when the extension is known
anatomize pack . --output codebase.jsonl
anatomize pack . --output codebase.md
# Full bundle
anatomize pack . --format markdown --output codebase.md
# Minimal prefix (lower token overhead)
anatomize pack . --prefix minimal --output codebase.md
# Explain selection (why files were included/excluded)
# (writes `codebase.md.selection.json` by default)
anatomize pack . --explain-selection --output codebase.md
# Filter by globs
anatomize pack . --include "src/**" --ignore "**/__pycache__/**" --output src-only.md
# Forward dependency closure (entrypoint + everything it imports)
anatomize pack . --entry src/anatomize/cli.py --deps --output slice.md
# Reverse dependency closure (module + everything that imports it)
anatomize pack . --target src/anatomize/cli.py --reverse-deps --output importers.md
# Reverse + forward (importers plus what they import)
anatomize pack . --target src/anatomize/cli.py --reverse-deps --deps --output importers-and-deps.md
# Token-efficient Python compression (signatures/imports/constants)
anatomize pack . --compress --output compressed.md
# Make markdown robust to embedded ``` fences (default)
anatomize pack . --content-encoding fence-safe --output safe.md
# Maximum robustness (content is base64-encoded UTF-8)
anatomize pack . --content-encoding base64 --output safe.base64.md
# Split output into multiple files (markdown/plain only)
anatomize pack . --split-output 500kb --output codebase.md
# Hard cap output (bytes or tokens)
anatomize pack . --max-output 20_000t --output codebase.md
# Print a per-file content token tree to stdout
anatomize pack . --token-count-tree --output codebase.md
# JSONL (stream-friendly)
anatomize pack . --format jsonl --output codebase.jsonl
# Hybrid mode (summaries + selective fill; token-efficient)
# - defaults to markdown when --format and the output extension are not specified
# - Python files default to summary; non-Python defaults to metadata-only
anatomize pack . --mode hybrid --output hybrid.md
# Hybrid: include full content for a slice and fit within a hard token budget (JSONL only)
anatomize pack . --mode hybrid --format jsonl --max-output 50_000t --fit-to-max-output \
--content "src/pkg/**" --output hybrid.slice.jsonlReference-based usage slicing (requires Pyright language server):
anatomize pack . --target src/anatomize/cli.py --uses --slice-backend pyright --output uses.mdfrom anatomize import SkeletonGenerator
from anatomize.formats import OutputFormat, write_skeleton
gen = SkeletonGenerator(sources=["./src"])
skeleton = gen.generate(level="modules")
print("Modules:", skeleton.metadata.total_modules)
print("Classes:", skeleton.metadata.total_classes)
print("Functions:", skeleton.metadata.total_functions)
print("Estimated tokens:", skeleton.metadata.token_estimate)
write_skeleton(skeleton, ".anatomy", formats=[OutputFormat.YAML, OutputFormat.JSON])anatomize.SkeletonGenerator: orchestrates discovery + extraction.anatomize.formats.write_skeleton: writes YAML/JSON/Markdown plus schemas andmanifest.json.anatomize.validation.validate_skeleton_dir: strict validator with optionalfix.
The CLI can auto-discover .anatomize.yaml. Generation commands use config from the current working directory (or explicit --config). pack discovers config relative to the chosen ROOT when --config is not provided.
Minimal config:
output: .anatomy
sources:
- path: src
output: src
level: modules
- path: tests
output: tests
level: hierarchy
# Defaults applied to sources that omit fields
level: modules
formats: [yaml, json, markdown]
exclude:
- __pycache__/
- "*.pyc"
symlinks: forbid # forbid|files|dirs|all
workers: 0 # 0 = auto
pack:
format: markdown # markdown|plain|json|xml|jsonl (hybrid supports markdown|plain|jsonl)
mode: bundle # bundle|hybrid (hybrid is token-efficient summaries + selective fill)
prefix: standard # standard|minimal
output: anatomize-pack.md # if the extension is known, it must match `format`
include: []
ignore: []
ignore_files: []
respect_standard_ignores: true
symlinks: forbid # forbid|files|dirs|all
max_file_bytes: 1000000
workers: 0 # 0 = auto
token_encoding: cl100k_base
compress: false
content_encoding: fence-safe # verbatim|fence-safe|base64 (markdown disallows verbatim)
line_numbers: false
no_structure: false
no_files: false
max_output: null # e.g. "500kb" or "20_000t"
split_output: null # e.g. "500kb" or "20_000t"
fit_to_max_output: false
# Hybrid representation rules (repeatable patterns). Precedence: meta < summary < content.
meta: []
summary: []
content: []
summary_config:
max_depth: 3
max_keys: 200
max_items: 200
max_headings: 200
python_roots: [] # defaults to ["src"] if present, else ["."]
slice_backend: imports # imports|pyright
uses_include_private: false
pyright_langserver_cmd: "pyright-langserver --stdio"Exclude patterns use gitignore-like semantics and are applied relative to each configured root.
Tip: anatomize init --preset standard scaffolds .anatomize.yaml with the common pattern “src detailed, tests minimal”.
write_skeleton(...) and anatomize generate ... --output DIR create:
hierarchy.yaml|json|md/modules.*/signatures.*depending on selected formats and levelschemas/*.jsonembedded with the packagemanifest.json(SHA-256 per output file and metadata for validation)
When anatomize generate runs from .anatomize.yaml, it writes one skeleton directory per configured source:
.anatomy/src/....anatomy/tests/...
anatomize pack writes one or more files depending on splitting:
anatomize-pack.md(or.txt|.json|.xml)- if split:
anatomize-pack.1.md,anatomize-pack.2.md, …
Each pack artifact starts with a lightweight, deterministic overview (and, if enabled, a structure tree) before file blocks/records.
Token reporting:
- Artifact tokens: exact tokens of the written output file(s) (returned by the Python API).
- Content tokens: tokens of file contents only (returned by the Python API; useful for budgeting).
Pack artifacts intentionally do not embed token counts (agents don’t need them; they waste tokens).
- Deterministic ordering (paths and symbols sorted).
- No timestamps in outputs.
- Parse failures are hard failures (no partial output).
- Validation is strict;
--fixreplaces output with regenerated content.
python -m venv .venv
. .venv/bin/activate
python -m pip install -U pip
python -m pip install -e ".[dev]"
python -m ruff check .
python -m mypy -p anatomize
python -m pytestOptional local benchmark:
.venv/bin/python scripts/bench_pack.py . --compress --workers 0Tests are indexed via pytest markers in pyproject.toml and documented in tests/README.md:
unit: fast, isolated testsintegration: filesystem-level testse2e: CLI-level tests
MIT