Metasift is a metadata extraction tool, and .docx password protection remover.
Soon to have support for cleaning metadata.
Python v3.10.7
LIMITED FEATURES: This app currently has limited features and only supports
.docxfiles at the moment. It will be expanded in the future to include more filetypes. My current focus is on implementing.docxmetadata cleaning.
✅ - Extract metadata from .docx files
✅ - Remove password protection from .docx files
✅ - Batch processing
When removing passwords from .docx files, Metasift will not modify the original
file in order to prevent any potential for corruption. It will instead
create a new /unlocked-documents directory where it will store a separate
unlocked version.
Clone the repository:
git clone https://github.com/nronzel/metasift.gitNavigate to the project directory:
cd metasiftNone! Only utilizes Python's standard library. 😎
Run Metasift by running the main.py file:
python main.pyor
python3 main.pyMetasift accepts either a filename:
test.docxor a directory path (relative or absolute):
.
./
/path/to/directoryIf a directory path is supplied, it will crawl that directory only without going into subfolders, and get all of the supported filetypes and attempt to extract the metadata.
This program was built and tested on Linux. It should work on any POSIX based systems such as Unix, Linux, MacOS, BSD, etc.
I have added some logic for checking for Windows filepaths, however I have not tested it on a Windows machine to verify everything works. There may also be issues with the ANSI color codes in your terminal on Windows as I believe ANSI codes are disabled by default.
You can run the provided unit tests with:
python tests.py -v-
re-write to use classes for better maintainability -
password protection removal for.docxfiles -
directory support for batch processing -
.docxmetadata cleaning -
.pdffile support - Option to export metadata to CSV
- EXIF data support
- Metadata cleaning of other filetypes as implemented
