simple dirty script to find duplicates
so there is two scripts
- putFileSHA1InCSVFile.php
- will put SHA1' files in a ... file yeah
- findduplicates
- will digest the file outputed from putFileSHA1InCSVFile
- putFileSHA1
a. Below the output when launched without parameters :
putFileSHA1InCSVFile.php : --path-to-fetch=<pathToFetch>
--destination-csv-file=<CSVFileToPutInformationIn.csv>
[--reference-csv-file=CSVReferenceFile.csv]
[--csv-separator=CSV_Separator:|_MySeparator_|]
[--allow-start-from-where-we-left=(true|false)]
[--help]
--path-to-fetch=<folder>: the folder to fetch
--destination-csv-file=<filename> : filename where will put information collected. (filepath<CSV_Separator>SHA1)
Optional --reference-csv-file=<filename>: a CSV file (generated previously by this script), that makes our script takes SHA1 from it (when possible) rather than calculating it again, useful when the script stopped after hours of calculation ...
Optional --csv-separator=<char|string> : by default '|_MySeparator_|'.
Optional --allow-start-from-where-we-left=<'true'|'false'> : allow to take other rather than starting from the beginning.
Optional --extensions-to-match=<expression> : default : 'avi|mkv|mp4|mpg|mpeg|divx|m2ts|ifo|vob' , useful because you are most likely to not want to find all duplicates.
Optional --help : display this help
b. Example
./putFileSHA1InCSVFile.php --path-to-fetch=/Users/enola/Downloads/ --destination-csv-file=dstfile.csv
here 'filecontainingsha1.csv' will contain something similar to this :
filename|_MySeparator_|SHA1
/Users/enola/Downloads/image 20171211_/untitled folder/20171210_092648.mp4|_MySeparator_|81ffd574bf22b515d1ccb4f97d7760c941402177
/Users/enola/Downloads/image 20171211_/untitled folder/20171210_085056.mp4|_MySeparator_|f6008997960a099bf989dedaff6e39b83961c69c
/Users/enola/Downloads/image 20171211_/untitled folder/20171210_101624.mp4|_MySeparator_|7eaa47d865a868f9ded6c46dd888ab09cb5e13b7
/Users/enola/Downloads/image 20171211_/untitled folder/20171210_102300.mp4|_MySeparator_|e437f4ee01f2e71c8d43fb1c17a3e503530fda04
/Users/enola/Downloads/image 20171211_/untitled folder/20171210_101807.mp4|_MySeparator_|e1487b726581da4fb9d0db3cd8b26a562fed88b0
/Users/enola/Downloads/image 20171211_/untitled folder/20171210_101718.mp4|_MySeparator_|3b6ee4d74745bb8d68af53a93dda2101f031948
- findduplicates
a. Below the output when launched without parameters :
findDuplicateSHA1InOneCSV.php : <CSVFileToRead> <CSVFileToPutInformationIn.csv> [CSV_Separator:|_MySeparator_|]
<CSVFileToRead> : the file generated by putFileSHA1InCSVFile.php, containing filenames, and associated SHA1
<CSVFileToPutInformationIn.csv> : the file to put duplicates in
Optionnal <CSV_Separator> : a sepeator for each column
Output will look like :
DuplicatedSHA1|_MySeparator|filename
DuplicatedSHA1|_MySeparator|filenamedup2
DuplicatedSHA1|_MySeparator|filenamedup3
b. Example
./findDuplicateSHA1InOneCSV.php filecontainingsha1.csv filecontainingduplicates.csv
getCSVFileContent
num of line:21
number Of Entry : 21
number Of Duplicated (SHA1), can hide more file: 1
number Of Duplicated files : 1
here 'filecontainingsha1.csv' will contain something similar to this :
SHA1|_MySeparator_|Duplicated Filename
8886e5dd984b7821bb0c374b4131f341bf6c84ba|_MySeparator_|/Users/enola/Downloads/The Outer Limits - 6x01 - Judgment Day.avi
8886e5dd984b7821bb0c374b4131f341bf6c84ba|_MySeparator_|/Users/enola/Downloads/prout/The Outer Limits - 6x01 - Judgment Day.avi