This repository contains code for a machine learning model that classifies text messages as spam or ham (not spam) using Natural Language Processing (NLP) techniques. The model utilizes the Naive Bayes algorithm with TF-IDF (Term Frequency-Inverse Document Frequency) feature extraction. It preprocesses text data, trains the model on labeled examples, and evaluates its performance using accuracy, precision, recall, and F1-score metrics. This project demonstrates how NLP can be applied to solve the problem of spam detection in text messages.
- Preprocessing text data: tokenization, stopword removal, and TF-IDF vectorization.
- Training a Naive Bayes classifier on labeled text messages.
- Evaluating model performance using standard classification metrics.
- Simple and easy-to-understand implementation for spam/ham classification.
- Python 3.x
- pandas
- scikit-learn
- Clone the repository:
- Navigate to the project directory:
- cd spam-ham-classification
- Install dependencies:
- pip install -r requirements.txt
- Run the main script:
- python spam_ham_classification.py
Contributions, bug reports, and feature requests are welcome! Please feel free to open an issue or submit a pull request.
This project is licensed under the MIT License.