A powerful spam email classifier combining traditional machine learning and deep learning (BERT, DistilBERT) to detect spam with high accuracy. Includes a full pipeline from preprocessing to deployment with an interactive web interface.
Clone the repo and install dependencies:
git clone https://github.com/allmen/email-spam-classifier.git
cd spam-classifier
pip install -r requirements.txtDownload NLTK data:
python -c "import nltk; nltk.download('stopwords'); nltk.download('punkt')"# Train all models (ML + DL)
python train_models.py
# Train only ML (faster)
python train_models.py --no-dlpython evaluate_models.pypython app.py
# Visit http://localhost:5000train_models.py: Train spam classifiersevaluate_models.py: Evaluate performanceapp.py: Flask web appnotebooks/: Interactive notebookweb/templates/index.html: UI page
- Traditional ML: Naive Bayes, SVM, Random Forest, Logistic Regression
- Deep Learning: BERT, DistilBERT
- Ensemble: Combines ML and DL for best results
Special thanks to the contributors of this project:
- Hugging Face for their pre-trained transformer models
- SpamAssassin for foundational spam detection datasets and techniques
Built with ❤️ by a passionate team for email security and AI learning.