For my final project of the IBM Data Science Professional Certification, I decided to perform segmentation analysis of Trulia homes for sale. The segmentation is based on nearby schools, coffee shops, libraries, and several other venue-related features; we needed to use the Foursquare REST API for this project, so this was my source for venue data.
I semi-invented a scenario where a young couple are moving to Eau Claire this summer and are looking to buy a home. Given some constraints and wishes from the clients, I used machine learning to cluster the houses according to total density of venues and also density of several categories of venue of interest to the "clients." You can read my techniques in EC_segmentation.pdf
Eau_Claire_Project.ipnyb
is the Jupyter Notebook associated with this scraping, and ML segmentation using K-means and OPTICS.EC_segmentation.pdf
is the report on my analysis: describing the business problem, data requirements and methods for obtaining and manipulating this data, methodology of analysis (which ML techniques I used, how, and why), and also some recommendations/conclusions.EC_pres.pdf
is a slide presentation that I imagine giving to the clients once our analysis is complete. There are several PDF "pages" per slide, so be sure to view this in "Presentation" mode in your PDF viewer.Density.html
andVenues.html
are the Folium maps produced as part of the analysis. The housing listings are color-coded based on the ML clustering, and links to the listings are provided in the popups.