Mean Square Terrors machine learning project for predicting Kaggle housing prices
- Predict housing prices better than any other group
- GOTO 10
Although NYCDSA does not score our results on predictive power, Kaggle uses the following metric:
"Submissions are evaluated on Root-Mean-Squared-Error (RMSE) between the logarithm of the predicted value and the logarithm of the observed sales price. (Taking logs means that errors in predicting expensive houses and cheap houses will affect the result equally.)"
- Do any easily modified features of a house disproportionately increase its sale price?
- Get familiar with what's present in the data
- Evaluate missingness
- Demonstration of EDA skills:
- Numeric methodology.
- Graphic methodology.
- Remove or impute missing data
- Create new desired variables based on what's already present in the data
- Obtain potentially-relevent data using our domain knowledge
- Produce merged dataset
- Test several ML methods we'll be exposed to by the end of the program and decide which are viable
- Demonstration of machine learning skills:
- Supervised methodology.
- Unsupervised methodology.
- Weight results with different methods
- Ability to assess model weaknesses and identify improvements.
- Leave final weekend to make a polished 20-minute presentation and practice delivery
- Communication of motivation: why do we care?
- Research questions of interest: what do you want to find out?
- Answers to research questions: what have you uncovered?