A Python utility for managing file backups and detecting duplicates using SHA1 hashes.
hashiby helps you manage backups by maintaining a database of file hashes and comparing directories to find duplicate files. It's particularly useful for managing photo collections, document backups, and other file archives where you need to identify which files are already backed up.
- Automatic database management: Creates and maintains
.hashiby.json
files in each directory - Efficient hash calculation: Only recalculates hashes when file size or modification time changes
- Smart file filtering: Ignores common temporary and system files by default
- Customizable ignore patterns: Add your own file patterns to ignore
- Fast duplicate detection: Uses SHA1 hashes to identify identical files regardless of filename or location
Install the package in development mode:
pip install -e .
Compare two directories to find files that already exist in your backup:
hashiby compare /path/to/backup /path/to/new/files
--ignore
or-i
: Add additional file patterns to ignore (can be used multiple times)--verbose
or-v
: Show detailed output during processing
# Basic comparison
hashiby compare ~/backup ~/new_photos
# Ignore additional patterns
hashiby compare ~/backup ~/new_files --ignore "*.tmp" --ignore "draft_*"
# Verbose output
hashiby compare ~/backup ~/new_files --verbose
-
Database Creation: hashiby automatically creates a
.hashiby.json
file in each directory containing:- Relative file paths
- File sizes
- Modification times
- SHA1 hashes
-
Efficient Updates: When scanning a directory, hashiby only recalculates hashes for files that have changed (different size or modification time)
-
Duplicate Detection: Compares SHA1 hashes AND base filenames between directories to identify duplicate files. Files are only considered duplicates if they have both the same content (hash) and the same base filename.
hashiby automatically ignores common temporary and system files:
- Version control:
.git
,.svn
,.hg
,.bzr
- Python:
__pycache__
,*.pyc
,*.pyo
,build
,dist
, etc. - Node.js:
node_modules
,npm-debug.log*
, etc. - OS files:
.DS_Store
,Thumbs.db
,desktop.ini
- IDE files:
.vscode
,.idea
,*.swp
- Temporary files:
*.tmp
,*.temp
,*.log
- Backup files:
*.bak
,*.backup
- hashiby databases:
.hashiby.json
The .hashiby.json
file contains a JSON object where each key is a relative file path and each value contains:
{
"photos/vacation.jpg": {
"size": 2048576,
"mtime": 1640995200.0,
"hash": "a1b2c3d4e5f6789..."
}
}
- Python 3.7+
- click >= 8.0.0
To run tests:
python -m pytest tests/
MIT License