Command-line package to generate a CSV with filepath, filename, and checksum for contents of a given directory or a single file.
Python 3.10+
pip install sum-buddyusage: sum-buddy [-h] [-o OUTPUT_FILE] [-i IGNORE_FILE | -H] [-a ALGORITHM] input_path
Generate CSV with filepath, filename, and checksums for all files in a given directory (or a single file)
positional arguments:
input_path File or directory to traverse for files
options:
-h, --help show this help message and exit
-o OUTPUT_FILE, --output-file OUTPUT_FILE
Filepath for the output CSV file
-i IGNORE_FILE, --ignore-file IGNORE_FILE
Filepath for the ignore patterns file
-H, --include-hidden Include hidden files
-a ALGORITHM, --algorithm ALGORITHM
Hash algorithm to use (default: md5; available: ripemd160, sha3_224, sha512_224, blake2b, sha384, sha256, sm3, sha3_256, shake_256, sha512, sha1, sha224, md5, md5-sha1, sha3_384, sha3_512, sha512_256, shake_128, blake2s)
-l LENGTH, --length LENGTH
Length of the digest for SHAKE (required) or BLAKE (optional) algorithms in bytes
Note: The available algorithms are determined by those available to
hashliband may vary depending on your system and OpenSSL version, so the set shown on your system withsum-buddy -hmay be different from above. At a minimum, it should include:{blake2s, blake2b, md5, sha1, sha224, sha256, sha384, sha512, sha3_224, sha3_256, sha3_384, sha3_512, shake_128, shake_256}, which is given byhashlib.algorithms_guaranteed.
- Basic Usage:
sum-buddy examples/example_content/Output
filepath,filename,md5 examples/example_content/file.txt,file.txt,7d52c7437e9af58dac029dd11b1024df examples/example_content/dir/file.txt,file.txt,7d52c7437e9af58dac029dd11b1024df
- Output to File:
sum-buddy --output-file examples/checksums.csv examples/example_content/Output
Calculating md5 checksums on examples/example_content/: 100%|███████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 1552.01it/s] md5 checksums for examples/example_content/ written to examples/checksums.csv
cat examples/checksums.csvOutput:
filepath,filename,md5 examples/example_content/file.txt,file.txt,7d52c7437e9af58dac029dd11b1024df examples/example_content/dir/file.txt,file.txt,7d52c7437e9af58dac029dd11b1024df
- Ignore Contents Based on Patterns:
sum-buddy --output-file examples/checksums.csv --ignore-file examples/.sbignore_except_txt examples/example_content/Output
Calculating md5 checksums on examples/example_content/: 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 1845.48it/s] md5 checksums for examples/example_content/ written to examples/checksums.csv
cat examples/checksums.csvOutput:
filepath,filename,md5 examples/example_content/file.txt,file.txt,7d52c7437e9af58dac029dd11b1024df examples/example_content/.hidden_dir/file.txt,file.txt,7d52c7437e9af58dac029dd11b1024df examples/example_content/dir/file.txt,file.txt,7d52c7437e9af58dac029dd11b1024df examples/example_content/dir/.hidden_dir/file.txt,file.txt,7d52c7437e9af58dac029dd11b1024df
- Include Hidden Files:
sum-buddy --output-file examples/checksums.csv --include-hidden examples/example_content/Output
Calculating md5 checksums on examples/example_content/: 100%|████████████████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 2101.35it/s] md5 checksums for examples/example_content/ written to examples/checksums.csv
cat examples/checksums.csvOutput:
filepath,filename,md5 examples/example_content/.hidden_file,.hidden_file,d41d8cd98f00b204e9800998ecf8427e examples/example_content/file.txt,file.txt,7d52c7437e9af58dac029dd11b1024df examples/example_content/.hidden_dir/.hidden_file,.hidden_file,d41d8cd98f00b204e9800998ecf8427e examples/example_content/.hidden_dir/file.txt,file.txt,7d52c7437e9af58dac029dd11b1024df examples/example_content/dir/.hidden_file,.hidden_file,d41d8cd98f00b204e9800998ecf8427e examples/example_content/dir/file.txt,file.txt,7d52c7437e9af58dac029dd11b1024df examples/example_content/dir/.hidden_dir/.hidden_file,.hidden_file,d41d8cd98f00b204e9800998ecf8427e examples/example_content/dir/.hidden_dir/file.txt,file.txt,7d52c7437e9af58dac029dd11b1024df
If only a target directory is passed, the default settings are to ignore hidden files and directories (those that begin with a .), use the md5 algorithm, and print output to stdout, which can be piped (|).
To include all files and directories, including hidden ones, use the --include-hidden (or -H) option.
To ignore files based on patterns, use the --ignore-file (or -i) option with the path to a file containing patterns to ignore. The --ignore-file works identically to how git handles a .gitignore file using the implementation from pathspec.
You may explore the filtering capabilities of the --ignore-file option by using the provided example files under examples/ and pointing at examples/example_content. The expected CSV output files are provided in examples/expected_outputs/.
The bash script, examples/run_examples will run all the examples; it was used to generate the expected_outputs.
We expose three functions to be used in your Python code:
get_checksums: Works like the CLI.gather_file_paths: Returns a list of file paths according to ignore patterns.checksum_file: Returns the checksum of a single file.
from sumbuddy import get_checksums, gather_file_paths, checksum_file
input_path = "examples/example_content"
output_file = "examples/checksums.csv"
include_hidden = True # Optional
ignore_file = "examples/.sbignore_except_txt" # Optional
alg = "md5" # Optional, possible inputs include list elements returned by hashlib.algorithms_available
# To generate checksums and save to a CSV file
get_checksums(input_path, output_file, ignore_file=ignore_file, algorithm=alg)
# or get_checksums(input_path, output_file, ignore_hidden=ignore_hidden)
# or get_checksums(input_path, output_file)
# outputs status bar followed by
# Checksums written to examples/checksums.csv
# To gather a list of file paths according to ignore/include patterns
file_paths = gather_file_paths(input_path, ignore_file=ignore_file)
# or file_paths = gather_file_paths(input_path, include_hidden=include_hidden)
# or file_paths = gather_file_paths(input_path)
# To calculate the checksum of a single file
sum = checksum_file("examples/example_content/file.txt", algorithm=alg)
# or sum = checksum_file("examples/example_content/file.txt")To develop the package further:
- Clone the repository and create a branch
- Install with dev dependencies:
pip install -e ".[dev]"- Install pre-commit hook
pre-commit install
pre-commit autoupdate # optionally update- Run tests:
pytest