Skip to content

Latest commit

 

History

History
37 lines (21 loc) · 1.85 KB

File metadata and controls

37 lines (21 loc) · 1.85 KB

Spam Email Classification

SCS 3251 Statistics for Data Science Project

Jupyter Notebooks:

Team members:

Name Github Repo
Arjie Cristobal https://github.com/quickheaven

Introduction

Spambase Dataset

The "spam" concept is diverse: advertisements for products/web sites, make money fast schemes, chain letters, pornography...

Our collection of spam e-mails came from our postmaster and individuals who had filed spam. Our collection of non-spam e-mails came from filed work and personal e-mails, and hence the word 'george' and the area code '650' are indicators of non-spam. These are useful when constructing a personalized spam filter. One would either have to blind such non-spam indicators or get a very wide collection of non-spam to generate a general purpose spam filter.

Dataset Source:

Link: Spambase

Presentation