Skip to content

Latest commit

 

History

History
17 lines (10 loc) · 819 Bytes

README.md

File metadata and controls

17 lines (10 loc) · 819 Bytes

Analytics-Edge

Predicting which iPads listed on EBay will be sold Independent project - Kaggle competition as part of MIT course 15.071x The Analytic Edge

Datasets: Clean dataset of 1861 listings (training set) and 798 listings (test set) with 10 variables

Features / Variables (79): Feature engineering (deviation price, average price)

Feature Selection (9): Guided through accuracies and AUC values

Model selection/tuning: text prepping and mining, logistic regression, random forest, CART analysis

R libraries: tm, rpart, ggplot 2, caret, lattice, e1071, ROCR

Results: Highest sample accuracy 0.835, Top 50% participants

Summary: Correlation analysis and classifiers (RF) could have been used for feature selection. Further feature engineering could have been done and evaluated to improved model accuracy