social-media-PII-scrubber

Python utility for Parsing and scrubbing PII from social media dumps Credit: Derived from jwindha1/sm-parser (which was forked from cspang1 & ndo3)

Extending functionality to include:

Classes to consolidate multi-use functions and data management
Lower temp memory demands: Zip files are not unzipped/destroyed
GUI: Simple input form for input values & launch parser
Progress bars for longer running processes & Options to skip media (face blurring) sequence

Download Visual Studio
Install cmake
Install dlib (for face_recognition)
python -m textblob.download_corpora (this will download ntlk_data, probably to your home/AppData/roaming folder)
Relocate nltk_data into your virtual environment .venv folder (path should be ./.venv/nltk_data)
Note: The Pyinstaller process (from launch_smparser.spec) requires numerous 'hook' files to collect the hidden imports and data files. These are included in the ./hooks folder.

v0.5 - IG Parameter paths read from data_mapping.json (editable outside executable); Face blur settings
v0.4 - Changed name detector/library to textblob
v0.3 - TTParser modified to suit data anonymity
v0.2 - SCParser added; New feature: Alias a custom list of values; Fix: Update Starting Date as duration is changed

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
SMParser		SMParser
hooks		hooks
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SMParser_MainScreen.PNG		SMParser_MainScreen.PNG
SMParser_MainScreen_top.PNG		SMParser_MainScreen_top.PNG
launch_smparser.py		launch_smparser.py
requirements.txt		requirements.txt