Skip to content

Latest commit

 

History

History
45 lines (35 loc) · 1.69 KB

Goals.md

File metadata and controls

45 lines (35 loc) · 1.69 KB

MST-housing-price-prediction

Mean Square Terrors machine learning project for predicting Kaggle housing prices

Goals

  1. Predict housing prices better than any other group
  2. GOTO 10

Outline

Although NYCDSA does not score our results on predictive power, Kaggle uses the following metric:

"Submissions are evaluated on Root-Mean-Squared-Error (RMSE) between the logarithm of the predicted value and the logarithm of the observed sales price. (Taking logs means that errors in predicting expensive houses and cheap houses will affect the result equally.)"

Questions worth answering

  1. Do any easily modified features of a house disproportionately increase its sale price?

Explore Data

  • Get familiar with what's present in the data
  • Evaluate missingness
  • Demonstration of EDA skills:
    • Numeric methodology.
    • Graphic methodology.

Clean Data

  • Remove or impute missing data
  • Create new desired variables based on what's already present in the data

Supplement Data

  • Obtain potentially-relevent data using our domain knowledge
  • Produce merged dataset

Test Machine Learning Methods

  • Test several ML methods we'll be exposed to by the end of the program and decide which are viable
  • Demonstration of machine learning skills:
    • Supervised methodology.
    • Unsupervised methodology.

Combine Models

  • Weight results with different methods
  • Ability to assess model weaknesses and identify improvements.

Create Presentation

  • Leave final weekend to make a polished 20-minute presentation and practice delivery
  • Communication of motivation: why do we care?
  • Research questions of interest: what do you want to find out?
  • Answers to research questions: what have you uncovered?