This repository implements a Spam Email Classifier using a Convolutional Neural Network (CNN) with a Conv1D architecture. The model takes in word frequency vectors extracted from emails and predicts whether the email is Spam or Not Spam.
To classify emails as Spam or Not Spam using word frequency vectors as input and a deep learning model built with TensorFlow/Keras.
The dataset represents each email as a vector where:
-
Columns: ~3000 words (features), where each column represents a word.
-
Rows: Individual emails.
-
Values: The count of word occurrences in the email.
-
Dataset Source: Kaggle - Email Spam Classification Dataset
1→ Spam0→ Not Spam
- TensorFlow/Keras: Model building and training.
- Pandas: Data preprocessing.
- Scikit-learn: Data scaling (StandardScaler).
- NumPy: Numerical computations.
- Matplotlib: Data visualization (optional).
- Streamlit: Interactive web deployment for predictions.
- Input Layer: Accepts word frequency vectors as input.
- Conv1D Layer(s): Extracts spatial patterns and relationships in word frequencies.
- Flatten Layer: Prepares features for dense layers.
- Dense Layer(s): Processes extracted features for prediction.
- Output Layer: Sigmoid activation for binary classification.
The model is now deployed on Streamlit, where users can easily input their email text and get a prediction of whether the email is Spam or Not Spam.
You can interact with the live model and try predictions directly on the Streamlit app:
Spam Email Classifier - Streamlit App
- Training Accuracy: ~98.1%
- Validation Accuracy: ~98%
- Dataset: Kaggle - Email Spam Classification Dataset
- Notebook: Spam Email Classifier Notebook on Kaggle
- Streamlit App: Spam Email Classifier on Streamlit
To access the model.h5 file, please contact me via email at:
dev.mahmoudalrefaey@gmail.com
This project explores the application of deep learning for text classification using CNN (Conv1D). By transforming emails into word frequency vectors, the model learns to distinguish patterns indicative of spam emails.
The dataset and implementation offer a solid baseline for further experimentation or deployment.