Skip to content

Gyth33/social-media-PII-scrubber

 
 

Repository files navigation

social-media-PII-scrubber

Python utility for Parsing and scrubbing PII from social media dumps Credit: Derived from jwindha1/sm-parser (which was forked from cspang1 & ndo3)

Extending functionality to include:

  • Facebook (v2 Schema)
  • Instagram (v2,v3 Schemas)
  • Snapchat
  • TikTok

Codebase improvements:

  • Classes to consolidate multi-use functions and data management
  • Lower temp memory demands: Zip files are not unzipped/destroyed
  • GUI: Simple input form for input values & launch parser
  • Progress bars for longer running processes & Options to skip media (face blurring) sequence

SMParser Main Screen

Installation sequence (for Windows)

  • Download Visual Studio
  • Install cmake
  • Install dlib (for face_recognition)
  • python -m textblob.download_corpora (this will download ntlk_data, probably to your home/AppData/roaming folder)
  • Relocate nltk_data into your virtual environment .venv folder (path should be ./.venv/nltk_data)
  • Note: The Pyinstaller process (from launch_smparser.spec) requires numerous 'hook' files to collect the hidden imports and data files. These are included in the ./hooks folder.

Release Notes

  • v0.5 - IG Parameter paths read from data_mapping.json (editable outside executable); Face blur settings
  • v0.4 - Changed name detector/library to textblob
  • v0.3 - TTParser modified to suit data anonymity
  • v0.2 - SCParser added; New feature: Alias a custom list of values; Fix: Update Starting Date as duration is changed

About

Scrub PII from social media dumps

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%