Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
vanessaaleung authored Nov 12, 2020
1 parent 7c87837 commit 90cc672
Showing 1 changed file with 13 additions and 0 deletions.
13 changes: 13 additions & 0 deletions prediction/spam-detection/README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,15 @@
# Spam Email Detection
_Create models to predict if an email message is spam or not_

## Data Exploration
- Percentage of spam in the data is 13.41%
- Finding 1: Spam message tends to be longer
- Finding 2: Spam message tends to have more digits
- Finding 3: Spam message tends to have more non-word characters

## Modeling
1. CountVectorizer & Multinomial Naive Bayes Classifier
2. TfidfVectorizer & Multinomial Naive Bayes Classifier
3. TfidfVectorizer & Support Vector Classification + Length of Document
4. TfidfVectorizer & Logistic Regression + Length of Document + # of digits per Document
5. CountVectorizer & Logistic Regression + + Length of Document + # of digits per Document + # of Non-word Characters

0 comments on commit 90cc672

Please sign in to comment.