This project aims to train a regression model to predict house prices using advanced techniques. We will be using a dataset that includes various features of houses such as the number of bedrooms, bathrooms, square footage, and the age of the house.
The dataset used for this project is the House Sales in King County, USA dataset, obtained from Kaggle. The dataset contains 21,613 observations with 19 features. We will use the following features for our analysis:
bedrooms: Number of bedrooms in the housebathrooms: Number of bathrooms in the housesqft_living: Square footage of the living areasqft_lot: Square footage of the lotfloors: Number of floors in the housewaterfront: Whether the house has a view to the waterfront or notview: An index from 0 to 4 of how good the view of the property wascondition: Overall condition of the housegrade: Overall grade given to the housing unit, based on King County grading systemsqft_above: Square footage of house apart from basementsqft_basement: Square footage of the basementyr_built: Year the house was builtyr_renovated: Year the house was renovated (if it was)zipcode: Zip code of the houselat: Latitude coordinate of the houselong: Longitude coordinate of the housesqft_living15: The average square footage of interior housing living space for the nearest 15 neighborssqft_lot15: The average square footage of the land lots of the nearest 15 neighbors
We will be using the following advanced regression techniques for our analysis:
- Multiple Linear Regression
- Ridge Regression
- Lasso Regression
- Elastic Net Regression
We will compare the performance of each model using various evaluation metrics such as Mean Squared Error (MSE) and R-squared.
By training and evaluating the regression models on the King County housing dataset, we hope to gain insights into which features have the most significant impact on house prices and which regression technique performs the best for this prediction task.