Spam Email Data Processing Project

Overview

This project focuses on preprocessing and transforming raw email text data into a clean, structured format suitable for machine learning–based spam detection. The primary goal was to prepare high-quality, model-ready input by handling noise, inconsistencies, and formatting issues commonly found in real-world email datasets.

Note: This project focuses on data preprocessing and feature preparation. The machine learning model itself was not developed as part of this work.

Objectives

Convert raw email text into structured, machine-learning-ready data
Improve data quality and consistency for downstream spam classification models
Apply standard text preprocessing techniques used in real-world ML pipelines

Key Features

Cleaning raw email text (removing unnecessary characters, formatting issues, etc.)
Text normalization (lowercasing, whitespace handling)
Tokenization and text transformation
Feature preparation for use in spam detection models
Structured dataset output suitable for training and evaluation

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
.gitignore		.gitignore
README.md		README.md
dataCleaner		dataCleaner
test_script		test_script
testing_dataExtractor		testing_dataExtractor
testing_datasubset.csv		testing_datasubset.csv
testing_messageTokenizer_labeled		testing_messageTokenizer_labeled
testing_messageTokenizer_not_labeled		testing_messageTokenizer_not_labeled
training_dataExtractor		training_dataExtractor
training_datasubset.csv		training_datasubset.csv
training_messageTokenizer		training_messageTokenizer

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spam Email Data Processing Project

Overview

Objectives

Key Features

About

Uh oh!

Releases

Packages

Languages

Lisa-Kooner/spam-email-data-processing

Folders and files

Latest commit

History

Repository files navigation

Spam Email Data Processing Project

Overview

Objectives

Key Features

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages