This repository contains a notebook and datset used to build a pipeline of functions to run scikit-learn's linear regression model to predict a home's sale price.
The pipeline contains three main functions to quickly iterate on different models.
transform_features
function is for feature engineeringselect_features
function is used to select featurestrain_and_test
function trains and tests the model using linear regression and returns the RMSE error metric
I am going to work with housing data for the city of Ames, Iowa, in the United States from 2006 to 2010. The data set contains 2930 observations and a 80 explanatory variables (23 nominal, 23 ordinal, 14 discrete, and 20 continuous) involved in assessing home values. For information on why the data was collected go here. More information about the different columns in the data can be found here.