A Machine Learning Approach to Predicting House Prices Using Advanced Regression Techniques Case Study: Ames, Iowa State
Kaggle has launched a competitive competition that aims not so much to get a monetary prize as to help spread the science of machine learning and open many areas of discussion among scientific researchers. In this competition, we will attempt to arrive at an optimal machine learning model to predict home prices in Ames, Iowa, using various advanced regression techniques.
(Here we analyze the data for The period from 2006 to 2010)
The data is divided into two groups:
- The training dataset contains 79 explanatory variables that describe (nearly) every aspect of a residential home, in addition to the sale price and ID variables. So the total (81 variables)
- The test dataset contains 79 explanatory variables that describe (almost) every aspect of residential homes, in addition to ID. So the total is (80 variables).
The target is to predict the appropriate selling price for each house in the test data.
The main goal here is to predict the final price of each house for each identifier in the test data set, by predicting the value of the selling price variable, which is not a simple or easy problem due to the multiplicity of features that can affect the test, in addition to the presence of some problems in the data The most important of which are missing values, outliers, and skewness of the distribution for some features.