Skip to content
/ spam Public

Analysis of a Spam Dataset using statistical modelling. Model bagging and boosting are applied in R. We demonstrate the use of decision trees, random forests, and gradient boosting.

Notifications You must be signed in to change notification settings

Robby955/spam

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Spam Dataset

This dataset consists of 4601 email observations, each labelled as spam (1) or not spam (0). There are 57 predictors, each being the relative frequencies of the most commonly occuring words and symbols in the email.

We use gradient boosting in R and model blending techniques to improve our accuracy. We also use desicion trees, and demonstrate how R can create tree plots.

What is this

This dataset is discussed in "The Elements of Statistical Learning, II edition". The data is also available at ftp.ics.uci.edu.

About

Analysis of a Spam Dataset using statistical modelling. Model bagging and boosting are applied in R. We demonstrate the use of decision trees, random forests, and gradient boosting.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages