Skip to content

scratcharchive/hashiby

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

hashiby

A Python utility for managing file backups and detecting duplicates using SHA1 hashes.

Overview

hashiby helps you manage backups by maintaining a database of file hashes and comparing directories to find duplicate files. It's particularly useful for managing photo collections, document backups, and other file archives where you need to identify which files are already backed up.

Features

  • Automatic database management: Creates and maintains .hashiby.json files in each directory
  • Efficient hash calculation: Only recalculates hashes when file size or modification time changes
  • Smart file filtering: Ignores common temporary and system files by default
  • Customizable ignore patterns: Add your own file patterns to ignore
  • Fast duplicate detection: Uses SHA1 hashes to identify identical files regardless of filename or location

Installation

Install the package in development mode:

pip install -e .

Usage

Basic Usage

Compare two directories to find files that already exist in your backup:

hashiby compare /path/to/backup /path/to/new/files

Options

  • --ignore or -i: Add additional file patterns to ignore (can be used multiple times)
  • --verbose or -v: Show detailed output during processing

Examples

# Basic comparison
hashiby compare ~/backup ~/new_photos

# Ignore additional patterns
hashiby compare ~/backup ~/new_files --ignore "*.tmp" --ignore "draft_*"

# Verbose output
hashiby compare ~/backup ~/new_files --verbose

How It Works

  1. Database Creation: hashiby automatically creates a .hashiby.json file in each directory containing:

    • Relative file paths
    • File sizes
    • Modification times
    • SHA1 hashes
  2. Efficient Updates: When scanning a directory, hashiby only recalculates hashes for files that have changed (different size or modification time)

  3. Duplicate Detection: Compares SHA1 hashes AND base filenames between directories to identify duplicate files. Files are only considered duplicates if they have both the same content (hash) and the same base filename.

Default Ignore Patterns

hashiby automatically ignores common temporary and system files:

  • Version control: .git, .svn, .hg, .bzr
  • Python: __pycache__, *.pyc, *.pyo, build, dist, etc.
  • Node.js: node_modules, npm-debug.log*, etc.
  • OS files: .DS_Store, Thumbs.db, desktop.ini
  • IDE files: .vscode, .idea, *.swp
  • Temporary files: *.tmp, *.temp, *.log
  • Backup files: *.bak, *.backup
  • hashiby databases: .hashiby.json

Database Format

The .hashiby.json file contains a JSON object where each key is a relative file path and each value contains:

{
  "photos/vacation.jpg": {
    "size": 2048576,
    "mtime": 1640995200.0,
    "hash": "a1b2c3d4e5f6789..."
  }
}

Requirements

  • Python 3.7+
  • click >= 8.0.0

Development

To run tests:

python -m pytest tests/

License

MIT License

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages