Skip to content

hsannav/project_rents

Repository files navigation

Rents predictor

This is a data analytics and machine learning dashboard built with Streamlit. It provides an interactive visualization of the rental housing market in Valencia, Spain, and offers a predictive model to estimate rental prices based on specific property characteristics.

Project Overview

This tool assists users in understanding rental trends across different neighborhoods in Valencia. It processes data collected from real estate listings to generate spatial visualizations and trains an XGBoost regression model to predict prices.

Key Features

  • Interactive Choropleth Map: Visualizes average rental prices per neighborhood using GeoJSON data.
  • Price Prediction Engine: Uses a machine learning model (XGBoost) to predict monthly rental costs based on inputs such as number of rooms, floor level, surface area, and presence of an elevator or exterior view.
  • Partial Dependence Plots (PDP): Visualizes how specific features (e.g., surface area, floor level) influence the predicted price in real-time, both globally and per neighborhood.

Repository Structure

The codebase has been refactored into a modular architecture to ensure maintainability and separation of concerns.

  • app.py: The entry point of the application. It orchestrates the Streamlit interface and user interactions.
  • config.py: Contains configuration constants, mapping dictionaries for neighborhood normalization, and feature definitions.
  • data_manager.py: Handles data loading, cleaning, and preparation. It manages interactions with the GeoJSON files and the pickled machine learning models.
  • visualization.py: Contains the logic for generating Plotly charts, including the map and Partial Dependence Plots.
  • parser.py: A standalone script used to parse raw HTML files from the scraping phase and convert them into a structured Excel dataset.

Data Pipeline

  1. Data Collection: Raw HTML files containing real estate listings are processed.
  2. Parsing: The parser.py script extracts relevant features (price, location, amenities) and handles data cleaning (removing nulls, normalizing neighborhood names).
  3. Storage: Processed data is stored in pisos.xlsx.
  4. Modeling: An XGBoost regressor is trained on this data to generate the modelos.pkl file used for inference.

Installation and Usage

Prerequisites

  • Python 3.8+
  • pip

Dependencies

Install the required packages:

pip install streamlit numpy pandas geopandas matplotlib xgboost scikit-learn plotly openpyxl beautifulsoup4

Running the Application

To launch the dashboard locally:

streamlit run app.py

The application will open in your default web browser at http://localhost:8501.

Running the Parser

If you have raw HTML files in a webs/ directory and wish to regenerate the dataset:

python parser.py

Files Required

To run this project successfully, ensure the following assets are present in the root directory:

  • barrios.geojson: Geospatial data for Valencia neighborhoods.
  • modelos.pkl: The pre-trained dictionary of XGBoost models.
  • pisos.xlsx: The dataset containing rental listing information.

Authors

  • Hugo Sánchez
  • María Verdú

Project created in June 2024.

About

Rents predictor for Valencia city based on XGBoost

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages