Skip to content

The aim of this Project is to suspect the E-mails which consist of offensive, anti-social elements and block them which will help in identifying the suspicious user. Suspicious email detection is a kind of mailing system where suspicious users are identified by determining the keywords used by him/her.

Notifications You must be signed in to change notification settings

theritik01/Suspicious-Email-Detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Email Spam Detection


➲ Project description

Email spam detection system is used to detect email spam using Machine Learning technique called Natural Language Processing and Python, where we have a dataset contain a lot of emails by extract important words and then use naive classifier we can detect if this email is spam or not.


➲ Prerequisites

This is list of required packages and modules for the project to be installed :

  • Python3
  • Pandas
  • Numpy
  • Scikit-learn
  • NLTK

Install all required packages :

 pip install -r requirements.txt


➲ The Dataset

Human activites dataset contain about 5728 record which is a sample of an email
and a target column "type" which describe the state of an email spam or not.

Dataset features and target :



➲ Coding Sections

In this part we will see the project code divided to sections as follows:

  • Section 1 | Data Preprocessing :
    In this section we aim to do some operations on the dataset before training the model on it,
    processes like :

    • Load dataset
    • Check for duplicates and remove them
    • Check for missing data for each column
    • Cleaning data from punctuation and stopwords and then tokenizing it into words (tokens)
    • Convert the text into a matrix of token counts
    • Split the data into training and testing sets

  • Section 2 | Model Creation :
    The dataset is ready for training, so we create a K-nearest Neighbors "KNN" model using scikit-learn and thin fit it to the data.

  • Section 3 | Model Evaluation :
    Finally we evaluate the model by getting accuracy, classification report and confusion matrix.

➲ Installation

  1. Clone the repo
    git clone https://github.com/theritik01/Suspicious-Email-Detection.git
  2. Run the code from cmd
    python email_spam_detection.py


➲ Output

Now let's see the project output after running the code :

Dataset head :


Dataset missing data count :


Dataset after cleaning puncituations and tokenizing text :


Classification report, confusion matrix and accuracy :



About

The aim of this Project is to suspect the E-mails which consist of offensive, anti-social elements and block them which will help in identifying the suspicious user. Suspicious email detection is a kind of mailing system where suspicious users are identified by determining the keywords used by him/her.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages