Skip to content

Commit

Permalink
Updated README
Browse files Browse the repository at this point in the history
  • Loading branch information
Jelena Mirkovic committed Aug 24, 2022
1 parent 51a9c4c commit 5158f48
Showing 1 changed file with 21 additions and 1 deletion.
22 changes: 21 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,21 @@
# phish-email-anonymizer
# phish-email-anonymizer

This project develops an anonymizer for phishing emails. The input are emails in JSON
format in an input folder. The output are anonymized emails with From, To, Subject
and body of the email, in an output folder, as plain text.

Emails can be transformed from EML to JSON using code at
https://gitlab.com/isi-piranha/tools/eml-munging-toolkit

Anonymizer will do the following:
- keep only content=text/plain parts of email
- keep from, to and subject from email header
- detect personal names and change them to other random names (gender is not preserved)
- detect locations and change them to AnonCity, AnonState, etc.
- detect phone numbers and change them to random phone numbers (10-digit, not guaranteed to be valid)
- detect email usernames and change them to random usernames in accordance to personal name changes
- detect email domains and change them ONLY if they belong to USC
- detect URLs and crop them to only keep the domain

Run as:
python anonymizer.py <input-folder> <output-folder>

0 comments on commit 5158f48

Please sign in to comment.