-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Jelena Mirkovic
committed
Aug 24, 2022
1 parent
51a9c4c
commit 5158f48
Showing
1 changed file
with
21 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,21 @@ | ||
# phish-email-anonymizer | ||
# phish-email-anonymizer | ||
|
||
This project develops an anonymizer for phishing emails. The input are emails in JSON | ||
format in an input folder. The output are anonymized emails with From, To, Subject | ||
and body of the email, in an output folder, as plain text. | ||
|
||
Emails can be transformed from EML to JSON using code at | ||
https://gitlab.com/isi-piranha/tools/eml-munging-toolkit | ||
|
||
Anonymizer will do the following: | ||
- keep only content=text/plain parts of email | ||
- keep from, to and subject from email header | ||
- detect personal names and change them to other random names (gender is not preserved) | ||
- detect locations and change them to AnonCity, AnonState, etc. | ||
- detect phone numbers and change them to random phone numbers (10-digit, not guaranteed to be valid) | ||
- detect email usernames and change them to random usernames in accordance to personal name changes | ||
- detect email domains and change them ONLY if they belong to USC | ||
- detect URLs and crop them to only keep the domain | ||
|
||
Run as: | ||
python anonymizer.py <input-folder> <output-folder> |