Prepared and presented by: Leah Pope (full time Data Science student)
Presentation: here
Presentation Video: here
The goal of this project is to answer questions about housing in King County, a county in Washington state. The Stakeholders for my project are Buyers looking for scenic homes in King County. These Buyers want to know:
- where to find scenic homes in King County
- what home prices could look like in the near future
Data Set Used:
- kc_house_data.csv
- Contains 2015-2015 Housing Sales data for King County, ~20,000 records.
Futher analysis into the following areas could yield additional insights.
-
King County government officals as Stakeholders I used the persona of "Scenic Home Buyers" to frame my stakeholder questions. A great future work idea is to use personas of King County government officals to frame stakeholder questions. I'm thinking specifically of Residential property tax assessors and county/city planning officals that want to learn about economic factors/data related to Residential property. Here are some example question I would like to explore:
- Improvement Trends: What percentage of homes are being renovated? What types of homes (large/small/historic/older) are being renovated?
- Tax Assessment Insights: Are homes that are larger than neighboring homes getting a 'tax break' in being compared to smaller properties?
- Home Quality Insights: Are there differences home quality between zip codes/county subregions
-
Model for each County Subregion I found a resource that listed 22 Subregions for King County, with title indicating a mix of Urban and Rural areas. I would like to create models for each Subregion instead of a single model trying to predict across a wide range of residential areas.
-
Additional Scenic Home questions I would like to explore if there are differences between scenic homes and neighboring homes.
- living area and lot size of scenic homes with their 15 nearest neighboring homes (sqft_living15 and sqft_lot15)
- grade/condition of scenic homes with other homes in same zipcode
-
Validate Waterfront NAN values I did a quick search for free APIs that would allow me to check the distance from lat/long coordinates to the nearest coastline. I did find a for-fee API KB Geo's Distance to Coast Web Service. It might be interesting/valuable to reverse geocode the lat/long of homes with Unknown/nan waterfront values and see if any of them are actually waterfront properties.
- Review the non-technical presentation here
- View the non-technical presentation video here
- Read the blog post (TBD) here
- Contact the author Leah Pope
--notebooks
----data_cleaning.ipynb
----general_eda.ipynb
----final_modeling.ipynb
----future_work.ipynb
--data
----kc_house_data.csv
----proc_kc_house_data.csv
----prepped_for_price_prediction.csv