Predicting which iPads listed on EBay will be sold Independent project - Kaggle competition as part of MIT course 15.071x The Analytic Edge
Datasets: Clean dataset of 1861 listings (training set) and 798 listings (test set) with 10 variables
Features / Variables (79): Feature engineering (deviation price, average price)
Feature Selection (9): Guided through accuracies and AUC values
Model selection/tuning: text prepping and mining, logistic regression, random forest, CART analysis
R libraries: tm, rpart, ggplot 2, caret, lattice, e1071, ROCR
Results: Highest sample accuracy 0.835, Top 50% participants
Summary: Correlation analysis and classifiers (RF) could have been used for feature selection. Further feature engineering could have been done and evaluated to improved model accuracy