Storage analysis and cleanup system for cross-platform filesystems
Scan WSL, Windows, and network drives to identify duplicates, stale files, and cleanup opportunities. Built for large filesystems with resumable scans, aggregate patterns, and LLM-friendly output.
# Install dependencies
pip3 install -r requirements.txt
# Scan your storage
python3 scripts/scan.py /home/chris --label wsl-home
python3 scripts/scan.py /mnt/c/Users/Chris --label windows-user
# Analyse for quick wins
python3 scripts/analyse.py --report duplicates
python3 scripts/analyse.py --report stale --stale-days 365
# Execute cleanup
python3 scripts/cleanup.py docker-prune
python3 scripts/cleanup.py clear-cache --type npm- Resumable scanning - Checkpoint-based scans handle timeouts on large filesystems
- Aggregate patterns - Treats
node_modules,.git, caches as single units for performance - Multi-root support - Scan WSL, Windows, network drives separately with labels
- Duplicate detection - Find identical files across all scanned locations
- Stale file identification - Track last access times, find archival candidates
- Built-in cleanup - Docker pruning, cache clearing, build artifact removal
- Usage Guide - Commands, workflows, and examples
- Cleanup Guide - Comprehensive cleanup recipes and safety protocols
- Architecture - Specification relationships and system design
- Development Guide - For contributors and Claude Code development
Emergency space recovery:
python3 scripts/cleanup.py uv-cache # Clear UV cache (often 50-200GB)
python3 scripts/cleanup.py docker-prune --all # Remove all unused Docker dataScanning large drives:
# Windows paths are slow from WSL - use native PowerShell when possible
python3 scripts/scan.py /mnt/c/Users/Chris --label windows
# Scans checkpoint every 1000 files - safe to interrupt and resume
python3 scripts/scan.py /large/path --label big-scanFinding duplicates:
python3 scripts/analyse.py --report duplicates
python3 scripts/analyse.py --report duplicates --output json > duplicates.jsonStorage: SQLite database (generated/hoardwick.db)
files- Indexed files with metadata, hashes, aggregation flagsdirectory_stats- Pre-computed directory metricsscans- Multi-root scan tracking with timestampsscan_progress- Resumable checkpoints
Aggregate Patterns (treated as single database entries):
node_modules/- npm packages.git/objects/- git objects__pycache__/,.next/,dist/,target/- build outputsvenv/,.venv/- Python virtual environmentsCache/,cache2/- browser caches
Performance:
- Checkpoints every 1000 files (configurable)
- Symlink loop detection via inode tracking
- Lazy hashing (only files < 100MB)
- Aggregation reduces database size by ~90%
- Python 3.10+
- click, rich
pip3 install -r requirements.txthoardwick/
├── scripts/ # Executable utilities
│ ├── scan.py # Storage scanner
│ ├── analyse.py # Duplicate/stale detection
│ └── cleanup.py # Cleanup operations
├── specs/ # LiveSpec specifications
├── docs/ # Documentation
├── generated/ # SQLite database (gitignored)
└── .livespec/ # LiveSpec framework
MIT