Skip to content

Addtion of a pdf checker

Compare
Choose a tag to compare
@albion2000 albion2000 released this 29 Aug 19:15
· 14 commits to master since this release

This release includes six tools :

naming_conventions.py & naming_conventions_do_rename.py to enforce some strict rules over the directory names in a file tree. naming_conventions.py is a preview with no effective renaming

check_jpegs for a fast sanity check of a jpegs file tree & check_jpegs_full for a deeper and slower sanity check

scandir2pdf for massive conversion from jpegs to pdfs.

scandirpdf2txt for massive conversion from ocred pdfs to txt files for the purpose of fast full text search with dedicated tools (google or else).

new: check_pdfs for a sanity check of a pdf files tree, it can detect corrupted files, even though some can be open with acrobat reader. There are certainly several possible causes of "false" positives : less standard formats and robustness of acrobat reader to corrupted files.

new : naming_conventions_files.py & naming_conventions_do_rename_files.py to enforce some strict rules over the file names in a file tree. naming_conventions_files.py is a preview with no effective renaming.

validated on 27K+ jpeg files, 3K+ pdfs.