hashiby

A Python utility for managing file backups and detecting duplicates using SHA1 hashes.

Overview

hashiby helps you manage backups by maintaining a database of file hashes and comparing directories to find duplicate files. It's particularly useful for managing photo collections, document backups, and other file archives where you need to identify which files are already backed up.

Features

Automatic database management: Creates and maintains .hashiby.json files in each directory
Efficient hash calculation: Only recalculates hashes when file size or modification time changes
Smart file filtering: Ignores common temporary and system files by default
Customizable ignore patterns: Add your own file patterns to ignore
Fast duplicate detection: Uses SHA1 hashes to identify identical files regardless of filename or location

Installation

Install the package in development mode:

pip install -e .

Usage

Basic Usage

Compare two directories to find files that already exist in your backup:

hashiby compare /path/to/backup /path/to/new/files

Options

--ignore or -i: Add additional file patterns to ignore (can be used multiple times)
--verbose or -v: Show detailed output during processing

Examples

# Basic comparison
hashiby compare ~/backup ~/new_photos

# Ignore additional patterns
hashiby compare ~/backup ~/new_files --ignore "*.tmp" --ignore "draft_*"

# Verbose output
hashiby compare ~/backup ~/new_files --verbose

How It Works

Database Creation: hashiby automatically creates a .hashiby.json file in each directory containing:
- Relative file paths
- File sizes
- Modification times
- SHA1 hashes
Efficient Updates: When scanning a directory, hashiby only recalculates hashes for files that have changed (different size or modification time)
Duplicate Detection: Compares SHA1 hashes AND base filenames between directories to identify duplicate files. Files are only considered duplicates if they have both the same content (hash) and the same base filename.

Default Ignore Patterns

hashiby automatically ignores common temporary and system files:

Version control: .git, .svn, .hg, .bzr
Python: __pycache__, *.pyc, *.pyo, build, dist, etc.
Node.js: node_modules, npm-debug.log*, etc.
OS files: .DS_Store, Thumbs.db, desktop.ini
IDE files: .vscode, .idea, *.swp
Temporary files: *.tmp, *.temp, *.log
Backup files: *.bak, *.backup
hashiby databases: .hashiby.json

Database Format

The .hashiby.json file contains a JSON object where each key is a relative file path and each value contains:

{
  "photos/vacation.jpg": {
    "size": 2048576,
    "mtime": 1640995200.0,
    "hash": "a1b2c3d4e5f6789..."
  }
}

Requirements

Python 3.7+
click >= 8.0.0

Development

To run tests:

python -m pytest tests/

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
hashiby		hashiby
tests		tests
.gitignore		.gitignore
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

hashiby

Overview

Features

Installation

Usage

Basic Usage

Options

Examples

How It Works

Default Ignore Patterns

Database Format

Requirements

Development

License

About

Uh oh!

Releases

Packages

Languages

scratcharchive/hashiby

Folders and files

Latest commit

History

Repository files navigation

hashiby

Overview

Features

Installation

Usage

Basic Usage

Options

Examples

How It Works

Default Ignore Patterns

Database Format

Requirements

Development

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages