The problem to resolve of this repository is about opening a new Pop Up Store for a furniture Retailer in the center of Madrid. We will get this insight through the analysis of empty apartments for renting and selling in Madrid.
If you want to see the presentation click on the below picture.
- Chamberi, Tetuan and Chamartín are adjacent districts. Together, they have the 18.03% of flats on sale and 26,6 % of renting flats. A location between the three districts will be suitable.
- The neighborhoods with the highest number of apartments for sale and renting are Center (Sol) and Salamancas. The price per meter is expensive in both locations, the hypothesis is that these areas have not a regular turnover. This hypothesis needs to be validated with other kind of data.
- Carabanchel and Puente de Vallecas are two areas with potential but the average squared meters and the number of rooms are smaller than the same values in the above districts.
- Select one of the top real estate web sites in Madrid: URL: www.fotocasa.es
- Design and preparation of the projects.
- Web Scraping all rent and sale apartments in Madrid center.
- Clean the data.
- Import the data to MySQL Workbench.
- Analyze the data.
- The insights
- I have used Miro to create a visualization of the data I need to get from website. Here the Miro link: https://miro.com/app/board/o9J_lT6lKyU=/
- I have used Trello as a productive tool to register the requirements needed to the project. https://trello.com/b/jU4yORwC
After the analysis www.fotocasa.es I have seen that the website is API based.
- Check if the request to the site web is allowed (get 200).
- Find the right headers of the site.
- Create a function to iterate through the different pages and recollect all the data in two json files. One for selling apartments and another for renting apartments. I've got around 18000 rows and 58 columns of selling apartments and 2300 rows and 8 columns of renting apartments.
- Get rid of the duplicated rows. I have discovered that half of the flats for selling where duplicated. So, I finalized with 9000 rows of selling flats.
- The json files were dictionaries oriented and I have normalized all the dictionaries to extract all the values related to the Price, Squared meters, Rooms, Location, terrace, parking, elevator, description of the announce and the links of the flat pictures.
- I have dropped all the unnecessary columns.
- I have searched the outliers through boxplot, IQR and Z-Score.
- I have removed the extreme values.
- I have created a new column with the price per squared meter.
- Same process for the Rent apartments.
- Export the files to CSV.
- Create a new schema. "Madrid"
- Import table sale
- Import table rent.
- I have prepared some queries in MySQL Workbench to be used in the analysis of the data.
- I've analyzed the places were the business competition has some locations in the center of Madrid.
- I have imported to Jupyter notebook the tables and I've filtered and group them base on Qty of apartments, neighborhood, price, surface, rooms and percentage of apartments per districts.
Based in the data and the analysis of the real state current situation. An adjacent location between Chamberi, Tetuan and Chamartin will be the most suitable location to open a PopUpStore for the furniture retailer.