This work was done for the capstone project of the professional certificate in Data Science by HarvardX. The goal is to classify headlines as click-bait or not click-bait. Three approaches were tried:
- classification based on linguistic features, like length or the presence of exclamation marks
- logistic regression with count-based string vectorization
- logistic regression with tf-idf vectorization