GitHub - rizz1406/Spam-Email-Detector: Spam Email Classifier using Python and Streamlit A simple machine learning project that classifies emails as **spam** or **ham** using the **Naive Bayes algorithm** and **TF-IDF** for text feature extraction. The project includes a user-friendly web app built with Streamlit

Spam Email Classifier using Python and Streamlit

Project Overview

This project demonstrates a simple Spam Email Classifier using Natural Language Processing (NLP) and machine learning techniques. The classifier is capable of predicting whether a given message is spam or ham (not spam) based on the message content. The model is trained on the popular SMS Spam Collection Dataset and is deployed via a Streamlit web application for easy user interaction.

Features

Machine Learning Model: Naive Bayes classifier
Text Preprocessing: TF-IDF (Term Frequency-Inverse Document Frequency) for feature extraction
Web Application: User-friendly interface built with Streamlit for real-time message classification
Dataset: SMS Spam Collection dataset from the UCI Machine Learning Repository
Performance: Achieved ~98% accuracy on the test data

Technologies Used

Python: Core language for data processing, machine learning, and web app development
Pandas: Data manipulation and analysis
Scikit-learn: Machine learning model and feature extraction
Streamlit: Framework for building a simple web interface
NLP: Natural Language Processing for text analysis and classification
TF-IDF: For converting text to numerical feature vectors

Installation & Setup

Clone the repository to your local machine:

git clone https://github.com/your-username/spam-classifier.git

Navigate to the project directory:
```
cd spam-classifier
```
Install the required dependencies:
```
pip install -r requirements.txt
```
Run the Streamlit app:
```
streamlit run spam_classifier.py
```
Open your browser and go to http://localhost:8501/. Enter a message to classify it as spam or ham.

Dataset

The dataset used in this project is the SMS Spam Collection dataset, which can be downloaded from the UCI Machine Learning Repository. It consists of 5,574 messages labeled as spam or ham.

Project Structure

.
├── spam_classifier.py                 # Main Python script to run the Streamlit app
├── spam.csv                           # SMS Spam Collection dataset (must be in this folder)
├── requirements.txt                   # Dependencies for the project
└── spam_classifier_project.ipynb      # Jupyter notebook with a step-by-step guide

How It Works

Loading the Dataset: The SMS Spam Collection dataset is loaded and cleaned. The labels (spam or ham) are mapped to binary values (1 for spam, 0 for ham).
Text Vectorization: The message content is transformed into numerical features using TF-IDF vectorization, which converts the raw text into a format that can be used by the machine learning model.
Model Training: A Naive Bayes classifier is trained using the vectorized text data. This model is known to perform well on text classification problems like spam detection.
Prediction & Evaluation: The trained model is evaluated on the test data, and metrics such as accuracy and confusion matrix are computed.
Streamlit Application: The model is deployed on a Streamlit web app where users can input a message and receive a real-time prediction (spam/ham).

Example Output

When you input a message into the Streamlit app, it will classify it as Spam or Ham based on the trained model.

Future Improvements

Enhance the UI and add more interactivity to the web app.
Implement additional machine learning models (e.g., SVM, deep learning models) for improved accuracy.
Incorporate other datasets to improve the generalization of the classifier.
Add NLP techniques for better preprocessing, such as lemmatization or stemming.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
requirements.txt		requirements.txt
spam.csv		spam.csv
spamEmail.py		spamEmail.py
spamEmailClassifier.ipynb		spamEmailClassifier.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Overview

Features

Technologies Used

Installation & Setup

Dataset

Project Structure

How It Works

Example Output

Future Improvements

About

Releases

Packages

Languages

rizz1406/Spam-Email-Detector

Folders and files

Latest commit

History

Repository files navigation

Project Overview

Features

Technologies Used

Installation & Setup

Dataset

Project Structure

How It Works

Example Output

Future Improvements

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages