This project is designed to process email data from .mbox files and convert it into a structured CSV format with additional features. We then anonymize the emails, making them suitable for further analysis to be use in RAG Application
Email_Processing.ipynb: Jupyter notebook containing the main script for processing .mbox files.Mbox Files/: Directory containing the input .mbox files.Output CSV/Raw Email/: Directory where the processed CSV files are saved.
- Extracts key email fields: date, sender, recipient, subject, and body.
- Handles multipart email messages.
- Converts .mbox files to CSV format.
- Conduct Features engineering
- Anonymizes email with Gen AI functions
- Python 3.x
- pandas
- mailbox (part of Python standard library)
- Place your .mbox file in the
Mbox Files/directory. - Open the
Email_Processing.ipynbnotebook. - Update the
mbox_pathvariable with the path to your .mbox file. - Update the
csv_pathvariable with the desired output path for the CSV file. - Run the notebook cells to process the .mbox file and generate the CSV.
The script will generate a CSV file containing the following columns:
- date
- from
- to
- subject
- body
- features engineered from the email body
This structured data can be used for various purposes, including training generative AI models or performing email analytics.
Ensure that you have the necessary permissions to read the input .mbox files and write to the output directory.