A US-based housing company named Surprise Housing has decided to enter the Australian market. The company uses data analytics to purchase houses at a price below their actual values and flip them on at a higher price. For the same purpose, the company has collected a data set from the sale of houses in Australia. The data is provided in the CSV file below.
The company is looking at prospective properties to buy to enter the market. You are required to build a regression model using regularisation in order to predict the actual value of the prospective properties and decide whether to invest in them or not.
You are required to model the price of houses with the available independent variables. This model will then be used by the management to understand how exactly the prices vary with the variables. They can accordingly manipulate the strategy of the firm and concentrate on areas that will yield high returns. Further, the model will be a good way for management to understand the pricing dynamics of a new market.
- Predicting a higher sale price for a house would not attact the customers which would lead to a loss to the company
- Predicting a lower sale price for a house would lead to reduces profit margin for the company
- Which variable are significant in predicting the sale price of the house?
- How well those variabe describe the sale price of the house?
- Optimal value of lambda for ridge and lasso regression?
- Steps for crreating a Regularized regression model :
- Data Visualization
- Perform EDA to understand various variables.
- Check the correlation between the variables.
- Data Preparation
- Create dummy variables for all the categorical features.
- Divide the data to train & Test.
- Perform Scaling.
- Divide data into dependent & Independent variables.
- Data Modelling & Evaluation
- Create Linear Regression model using RFE
- Create L1 and L2 Regularized Models using the output of the RFE
- Check the various assumptions.
- Check the Adjusted R-Square for both train & Test data.
- Report the final model.
- LotFrontage - As LotFrontage in increases from [20.7 to 196.2] the SalePrice increases
- LotArea - As LotArea in increases from [1086 to 65483] the SalePrice increases
- MasVnrArea - As MasVnrArea in increases from [0 to 1280] the SalePrice increases
- BsmtFinSF1 - As BsmtFinSF1 in increases from [0 to 2257] the SalePrice increases
- BsmtFinSF2 - As BsmtFinSF2 in increases thier is not much effect on SalePrice
- BsmtUnfSF - As BsmtUnfSF in increases from [1168 to 2336] the SalePrice increases
- TotalBsmtSF - As TotalBsmtSF in increases from [0 to 3666] the SalePrice increases
- 1stFlrSF - As 1stFlrSF in increases from [329.64 to 3384] the SalePrice increases
- 2ndFlrSF - As 2ndFlrSF in increases from [206.5 to 1858] the SalePrice increases
- GrLivArea - As GrLivArea in increases from [328 to 4480] the SalePrice increases
- GarageYrBlt - As GarageYrBlt in increases thier is not much effect on SalePrice
- GarageArea - As GarageArea in increases from [0 to 1276] the SalePrice increases
- WoodDeckSF - As WoodDeckSF in increases from [0 to 685] the SalePrice increases
- OpenPorchSF - As OpenPorchSF in increases thier is not much effect on SalePrice
- EnclosedPorch - As EnclosedPorch in increases thier is not much effect on SalePrice
- AgeBuilt - As AgeBuilt in increases from [0 to 81] the SalePrice decreases and from [81 to 136] the SalePrice slightly increases
- AgeRemod - As AgeRemod in increases from [0 to 60] the SalePrice decreases
- MSSubClass - 20, 50, 75, 120 have higher SalePrice than other MSSubClass
- MSZoning - RL and FV have higher SalePrice than other MSZoning
- LotShape - IR2 and IR1 have slightly higher avarage SalesPrice than other LotShape
- LandContour - HLS has slightly higher avarage SalesPrice than other LandContour
- LotConfig - CulDSac has slightly higher avarage SalesPrice than other LotConfig
- Neighborhood - NoRidge, NridgHt, Timber and StoneBr have higher SalePrice than other Neighborhood
- Condiction1 - PosN and RRNn have slightly higher SalePrice than other Condiction1
- BldgType - 1Fam and TwnhsE have slightly higher SalePrice than other BldgType
- HouseStyle - 2Story, 1Story and 2.5Fin have higher SalePrice than other HouseStyle
- OverallQual - As the OverallQual increases the the SalePrice increases steeply
- OverallCond - As the OverallCond increases the the SalePrice increases
- RoofStyle - No effect of RoofStyle on the SalePrice
- Exterior1st - VinylSd, CemntBd and Stone have higher SalePrice than other Exterior1st
- Exterior2nd - VinylSd, CemntBd and ImStucc have higher SalePrice than other Exterior2nd
- MasVnrType - Stone and SBrkr have higher avarage SalesPrice than other MasVnrType
- ExterQual - Ex has significant higher SalePrice than other ExterQual
- ExterCond - No effect of ExterCond on the SalePrice
- Foundation - PConc has higher avarage SalesPrice than other Foundation
- BsmtQual - Ex and Gd have significant higher SalePrice than other BsmtQual
- BsmtCond - TA and Gd have higher SalePrice than other BsmtCond
- BsmtExposure - Gd has higher SalePrice than other BsmtExposure
- BsmtFinType1 - GL Q has higher SalePrice than other BsmtFinType1
- BsmtFinType2 - GL Q has higher SalePrice than other BsmtFinType2
- HeatingQC - As HeatingQC becomes poor the SalePrice decreases
- BsmtFullBath - No effect of BsmtFullBath on the SalePrice
- FullBath - 2 and 3 ave significant higher SalePrice than other FullBath
- HalfBath - 1 has slightly higher avarage SalePrice than other HalfBath
- BedroomAbvGr - 0 and 4 have slightly higher avarage SalePrice than other BedroomAbvGr
- KitchenQual - Ex has significant higher SalePrice than other KitchenQual
- TotRmsAbvGrd - As the TotRmsAbvGrd increases the the SalePrice increases
- FirePlaces - As the FirePlaces increases the the SalePrice increases
- FireplaceQu - Ex has significant higher SalePrice than other FireplaceQu
- GarageType - Attchd and BuiltIn have higher SalePrice than other GarageType
- GarageFinish - Fin has higher SalePrice than other GarageFinish
- GarageCars - As the GarageCars increases the the SalePrice increases except for 4
- GarageQual - Gd and Ex have higher SalePrice than other GarageQual
- Fence - No effect of Fence on the SalePrice
- MoSold - No effect of MoSold on the SalePrice
- YrSold - No effect of YrSold on the SalePrice
- SaleType - New, CWD and Con have higher SalePrice than other SaleType
- SaleCondition - Partial has significant higher SalePrice than other SaleCondition
- 'OverallQual_10'
- 'OverallQual_9'
- 'Neighborhood_NoRidge'
- 'FullBath_3'
- 'TotRmsAbvGrd_11'
- 'Fireplaces_3'
- OverallQual_10 = 0.833928
- OverallQual_9 = 0.830237
- Neighborhood_NoRidge = 0.592742
- FullBath_3 = 0.530678
- TotRmsAbvGrd_11 = 0.505044
- Fireplaces_3 = 0.442772
- Ridge : 5.0
- Lasso : 0.001
Chossing the Lasso Regularized model as the final model to predict the SalePrice because of the following reasons
- More Feature elimination which would lead to making the model simple, robust and generalized model
- Similar performace when compared to Ridge Regularization
- pandas - 1.3.4
- numpy - 1.20.3
- matplotlib - 3.4.3
- seaborn - 0.11.2
- plotly - 5.8.0
- sklearn - 1.1.2
- statsmodel - 0.13.2
- This project was group case study for an online advance course.
- https://www.geeksforgeeks.org/
- https://seaborn.pydata.org/
- https://plotly.com/
- https://pandas.pydata.org/
- https://learn.upgrad.com/
Created by [@darshil2848] - feel free to contact me!