Skip to content

Conversation

@cbullinger
Copy link
Collaborator

Summary

This PR adds a new analyze orphaned-files command that identifies files with no incoming references from other files in the documentation.

What's New

The analyze orphaned-files command scans all RST and YAML files in a source directory to build a complete reference map, then identifies files that have zero incoming references.

Key Features

  • Comprehensive Scanning: Detects references from include, literalinclude, io-code-block, and toctree directives
  • Flexible Options:
    • --include-toctree: Consider navigation links when determining orphaned status
    • --exclude <pattern>: Exclude paths matching glob patterns (e.g., */archive/*)
    • --verbose: Show detailed scanning progress
    • --format json: Machine-readable output for automation
  • Multiple Output Formats:
    • Text: Human-readable with helpful suggestions
    • JSON: Structured data for scripting
    • Count-only: Just the number of orphaned files
    • Paths-only: File paths for piping to other commands

Use Cases

This command helps documentation writers:

  • Find unused include files that can be removed
  • Identify documentation pages not linked in the navigation
  • Discover legacy content that needs cleanup
  • Maintain documentation hygiene by removing dead files
  • Identify entry points (like index.rst) that are referenced externally

Examples

# Find orphaned files (content inclusion only)
./audit-cli analyze orphaned-files ~/docs/source

# Include navigation links
./audit-cli analyze orphaned-files ~/docs/source --include-toctree

# Get JSON output
./audit-cli analyze orphaned-files ~/docs/source --format json

# Just show the count
./audit-cli analyze orphaned-files ~/docs/source --count-only

# Exclude archived files
./audit-cli analyze orphaned-files ~/docs/source --exclude "*/archive/*"

Testing

  • ✅ All existing tests pass
  • ✅ New comprehensive tests added for orphaned files detection
  • ✅ Tested on actual testdata directory
  • ✅ Verified all output formats work correctly
  • ✅ Confirmed --include-toctree flag changes results appropriately

Files Changed

New Files

  • audit-cli/commands/analyze/orphaned-files/orphaned_files.go - Main command implementation
  • audit-cli/commands/analyze/orphaned-files/analyzer.go - Core analysis logic
  • audit-cli/commands/analyze/orphaned-files/output.go - Output formatting
  • audit-cli/commands/analyze/orphaned-files/types.go - Data structures
  • audit-cli/commands/analyze/orphaned-files/orphaned_files_test.go - Tests

Modified Files

  • audit-cli/commands/analyze/analyze.go - Registered new subcommand
  • audit-cli/README.md - Added comprehensive documentation

Pull Request opened by Augment Code with guidance from the PR author

- Implements new command to find files with no incoming references
- Scans all RST and YAML files to build complete reference map
- Identifies files not referenced by include, literalinclude, io-code-block, or toctree
- Supports --include-toctree flag to consider navigation links
- Supports --exclude pattern to skip certain paths
- Provides text, JSON, count-only, and paths-only output formats
- Includes comprehensive tests and documentation
- Useful for finding unused includes, unlinked pages, and legacy content
@cbullinger cbullinger closed this Nov 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant