Skip to content

Commit 5294e77

Browse files
authored
Selective attention to directory contents (#8)
Added features: * By default, ignore hidden files and directories * By default, send output to stdout to work like a Unix command * Add `--include-hidden` option to include hidden files and directories (everything mode) * Add `--ignore-file` option to specify a file with patterns to ignore, works like .gitignore * Add `--algorithm` option to specify the hash algorithm to use, default to md5 * For `--output-file` option (instead of stdout), add message to prevent accidental overwrite of existing files * Add detailed usage examples to README and `examples/` directory * Expose `get_checksums`, `gather_file_paths`, and `checksum_file` functions for use in Python
1 parent 6a12d11 commit 5294e77

39 files changed

+449
-36
lines changed

README.md

Lines changed: 129 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# sum-buddy
2-
Command-line package to generate a CSV with filepath, filename, and MD5 checksum for all contents of given directory.
2+
Command-line package to generate a CSV with filepath, filename, and checksum for contents of given directory.
33

44

55
## Requirements
@@ -18,22 +18,142 @@ pip install git+https://github.com/Imageomics/sum-buddy
1818
### Command Line Usage
1919

2020
```
21-
usage: sum-buddy [-h] --input-dir INPUT_DIR --output-file OUTPUT_FILE
21+
usage: sum-buddy [-h] [-o OUTPUT_FILE] [-i IGNORE_FILE | -H] [-a ALGORITHM] input_dir
2222
23-
Generate CSV with filepath, filename, and MD5 checksums for all files in a given directory
23+
Generate CSV with filepath, filename, and checksums for all files in a given directory
24+
25+
positional arguments:
26+
input_dir Directory to traverse for files
2427
2528
options:
26-
-h, --help show this help message and exit
27-
--input-dir INPUT_DIR Directory to traverse for files
28-
--output-file OUTPUT_FILE Filepath for the output CSV file
29+
-h, --help show this help message and exit
30+
-o OUTPUT_FILE, --output-file OUTPUT_FILE
31+
Filepath for the output CSV file
32+
-i IGNORE_FILE, --ignore-file IGNORE_FILE
33+
Filepath for the ignore patterns file
34+
-H, --include-hidden Include hidden files
35+
-a ALGORITHM, --algorithm ALGORITHM
36+
Hash algorithm to use (default: md5; available: ripemd160, sha3_224, sha512_224, blake2b, sha384, sha256, sm3, sha3_256, shake_256, sha512, sha1, sha224, md5, md5-sha1, sha3_384, sha3_512, sha512_256, shake_128, blake2s)
37+
```
38+
39+
#### CLI Examples
40+
41+
- **Basic Usage:**
42+
```bash
43+
sum-buddy examples/example_content/
44+
```
45+
> Output
46+
> ```console
47+
> filepath,filename,md5
48+
> examples/example_content/file.txt,file.txt,7d52c7437e9af58dac029dd11b1024df
49+
> examples/example_content/dir/file.txt,file.txt,7d52c7437e9af58dac029dd11b1024df
50+
> ```
51+
52+
- **Output to File:**
53+
```bash
54+
sum-buddy --output-file examples/checksums.csv examples/example_content/
55+
```
56+
> Output
57+
> ```console
58+
> Calculating md5 checksums on examples/example_content/: 100%|███████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 1552.01it/s]
59+
> md5 checksums for examples/example_content/ written to examples/checksums.csv
60+
> ```
61+
```bash
62+
cat examples/checksums.csv
63+
```
64+
> Output:
65+
> ```console
66+
> filepath,filename,md5
67+
> examples/example_content/file.txt,file.txt,7d52c7437e9af58dac029dd11b1024df
68+
> examples/example_content/dir/file.txt,file.txt,7d52c7437e9af58dac029dd11b1024df
69+
> ```
70+
71+
- **Ignore Contents Based on Patterns:**
72+
```bash
73+
sum-buddy --output-file examples/checksums.csv --ignore-file examples/.sbignore_except_txt examples/example_content/
74+
```
75+
> Output
76+
> ```console
77+
> Calculating md5 checksums on examples/example_content/: 100%|████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:00<00:00, 1845.48it/s]
78+
> md5 checksums for examples/example_content/ written to examples/checksums.csv
79+
>```
80+
```bash
81+
cat examples/checksums.csv
2982
```
83+
> Output:
84+
> ```console
85+
> filepath,filename,md5
86+
> examples/example_content/file.txt,file.txt,7d52c7437e9af58dac029dd11b1024df
87+
> examples/example_content/.hidden_dir/file.txt,file.txt,7d52c7437e9af58dac029dd11b1024df
88+
> examples/example_content/dir/file.txt,file.txt,7d52c7437e9af58dac029dd11b1024df
89+
> examples/example_content/dir/.hidden_dir/file.txt,file.txt,7d52c7437e9af58dac029dd11b1024df
90+
>```
91+
- **Include Hidden Files:**
92+
```bash
93+
sum-buddy --output-file examples/checksums.csv --include-hidden examples/example_content/
94+
```
95+
> Output
96+
> ```console
97+
> Calculating md5 checksums on examples/example_content/: 100%|████████████████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 2101.35it/s]
98+
> md5 checksums for examples/example_content/ written to examples/checksums.csv
99+
> ```
100+
101+
```bash
102+
cat examples/checksums.csv
103+
```
104+
> Output:
105+
> ```console
106+
> filepath,filename,md5
107+
> examples/example_content/.hidden_file,.hidden_file,d41d8cd98f00b204e9800998ecf8427e
108+
> examples/example_content/file.txt,file.txt,7d52c7437e9af58dac029dd11b1024df
109+
> examples/example_content/.hidden_dir/.hidden_file,.hidden_file,d41d8cd98f00b204e9800998ecf8427e
110+
> examples/example_content/.hidden_dir/file.txt,file.txt,7d52c7437e9af58dac029dd11b1024df
111+
> examples/example_content/dir/.hidden_file,.hidden_file,d41d8cd98f00b204e9800998ecf8427e
112+
> examples/example_content/dir/file.txt,file.txt,7d52c7437e9af58dac029dd11b1024df
113+
> examples/example_content/dir/.hidden_dir/.hidden_file,.hidden_file,d41d8cd98f00b204e9800998ecf8427e
114+
> examples/example_content/dir/.hidden_dir/file.txt,file.txt,7d52c7437e9af58dac029dd11b1024df
115+
>```
116+
117+
118+
If only a target directory is passed, the default settings are to ignore hidden files and directories (those that begin with a `.`), use the `md5` algorithm, and print output to `stdout`, which can be piped (`|`).
119+
120+
To include all files and directories, including hidden ones, use the `--include-hidden` (or `-H`) option.
121+
122+
To ignore files based on patterns, use the `--ignore-file` (or `-i`) option with the path to a file containing patterns to ignore. The `--ignore-file` works identically to how `git` handles a `.gitignore` file using the implementation from [pathspec](https://github.com/cpburnz/python-pathspec).
123+
124+
You may explore the filtering capabilities of the `--ignore-file` option by using the provided example files under `examples/` and pointing at `examples/example_content`. The expected CSV output files are provided in `examples/expected_outputs/`.
125+
126+
The `bash` script, `examples/run_examples` will run all the examples; it was used to generate the `expected_outputs`.
30127
31128
### Python Package Usage
129+
We expose three functions to be used in your Python code:
130+
- `get_checksums`: Works like the CLI.
131+
- `gather_file_paths`: Returns a list of file paths according to ignore patterns.
132+
- `checksum_file`: Returns the checksum of a single file.
133+
32134
```python
33-
from sumbuddy import get_checksums
135+
from sumbuddy import get_checksums, gather_file_paths, checksum_file
34136
35-
get_checksums("path/to/image/folder", "path/to/checksums.csv")
137+
input_dir = "examples/example_content"
138+
output_file = "examples/checksums.csv"
139+
include_hidden = True # Optional
140+
ignore_file = "examples/.sbignore_except_txt" # Optional
141+
alg = "md5" # Optional, possible inputs include list elements returned by hashlib.algorithms_available
142+
143+
# To generate checksums and save to a CSV file
144+
get_checksums(input_dir, output_file, ignore_file=ignore_file, algorithm=alg)
145+
# or get_checksums(input_dir, output_file, ignore_hidden=ignore_hidden)
146+
# or get_checksums(input_dir, output_file)
36147
37148
# outputs status bar followed by
38-
# Checksums written to path/to/checksums.csv
149+
# Checksums written to examples/checksums.csv
150+
151+
# To gather a list of file paths according to ignore/include patterns
152+
file_paths = gather_file_paths(input_dir, ignore_file=ignore_file)
153+
# or file_paths = gather_file_paths(input_dir, include_hidden=include_hidden)
154+
# or file_paths = gather_file_paths(input_dir)
155+
156+
# To calculate the checksum of a single file
157+
sum = checksum_file("examples/example_content/file.txt", algorithm=alg)
158+
# or sum = checksum_file("examples/example_content/file.txt")
39159
```

examples/.sbignore_all

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
*

examples/.sbignore_all_except_dots

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
*
2+
!.*
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
*
2+
!dir/
3+
.*

examples/.sbignore_except_txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
*
2+
!*.txt
3+

examples/.sbignore_hidden_files

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
.*

examples/.sbignore_nothing

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+

examples/.sbignore_specific_file

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
file.txt
2+
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
dir/file.txt
2+

examples/.sbignore_subdir

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
dir/
2+

0 commit comments

Comments
 (0)