Skip to content

Multiple Linear Regression Modeling for King County Home Prices

License

Notifications You must be signed in to change notification settings

lspope/dsc-phase-2-project-online

 
 

Repository files navigation

Multiple Linear Regression Modeling for King County Home Prices

Phase 2 Project

Flatiron Online Data Science Bootcamp

Prepared and presented by: Leah Pope (full time Data Science student)

Presentation: here

Presentation Video: here

king_county_map

Introduction

The goal of this project is to answer questions about housing in King County, a county in Washington state. The Stakeholders for my project are Buyers looking for scenic homes in King County. These Buyers want to know:

  • where to find scenic homes in King County
  • what home prices could look like in the near future

Data Description

Data Set Used:

  • kc_house_data.csv
    • Contains 2015-2015 Housing Sales data for King County, ~20,000 records.

EDA Questions Explored

Question 1: Where are scenic homes located (geographically and zipcode)

Question 2: What Types of scenic homes are in King County?

Question 3: Is there a difference in Price between Regular and Scenic homes?

Modeling

Can we predict price using this dataset?

Next Steps/Future Work

Futher analysis into the following areas could yield additional insights.

  • King County government officals as Stakeholders I used the persona of "Scenic Home Buyers" to frame my stakeholder questions. A great future work idea is to use personas of King County government officals to frame stakeholder questions. I'm thinking specifically of Residential property tax assessors and county/city planning officals that want to learn about economic factors/data related to Residential property. Here are some example question I would like to explore:

    • Improvement Trends: What percentage of homes are being renovated? What types of homes (large/small/historic/older) are being renovated?
    • Tax Assessment Insights: Are homes that are larger than neighboring homes getting a 'tax break' in being compared to smaller properties?
    • Home Quality Insights: Are there differences home quality between zip codes/county subregions
  • Model for each County Subregion I found a resource that listed 22 Subregions for King County, with title indicating a mix of Urban and Rural areas. I would like to create models for each Subregion instead of a single model trying to predict across a wide range of residential areas.

  • Additional Scenic Home questions I would like to explore if there are differences between scenic homes and neighboring homes.

    • living area and lot size of scenic homes with their 15 nearest neighboring homes (sqft_living15 and sqft_lot15)
    • grade/condition of scenic homes with other homes in same zipcode
  • Validate Waterfront NAN values I did a quick search for free APIs that would allow me to check the distance from lat/long coordinates to the nearest coastline. I did find a for-fee API KB Geo's Distance to Coast Web Service. It might be interesting/valuable to reverse geocode the lat/long of homes with Unknown/nan waterfront values and see if any of them are actually waterfront properties.

For More Information

  • Review the non-technical presentation here
  • View the non-technical presentation video here
  • Read the blog post (TBD) here
  • Contact the author Leah Pope

Repository Structure

--notebooks
----data_cleaning.ipynb
----general_eda.ipynb
----final_modeling.ipynb
----future_work.ipynb
--data
----kc_house_data.csv
----proc_kc_house_data.csv
----prepped_for_price_prediction.csv

About

Multiple Linear Regression Modeling for King County Home Prices

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.9%
  • Python 0.1%