This is a data analytics and machine learning dashboard built with Streamlit. It provides an interactive visualization of the rental housing market in Valencia, Spain, and offers a predictive model to estimate rental prices based on specific property characteristics.
This tool assists users in understanding rental trends across different neighborhoods in Valencia. It processes data collected from real estate listings to generate spatial visualizations and trains an XGBoost regression model to predict prices.
- Interactive Choropleth Map: Visualizes average rental prices per neighborhood using GeoJSON data.
- Price Prediction Engine: Uses a machine learning model (XGBoost) to predict monthly rental costs based on inputs such as number of rooms, floor level, surface area, and presence of an elevator or exterior view.
- Partial Dependence Plots (PDP): Visualizes how specific features (e.g., surface area, floor level) influence the predicted price in real-time, both globally and per neighborhood.
The codebase has been refactored into a modular architecture to ensure maintainability and separation of concerns.
- app.py: The entry point of the application. It orchestrates the Streamlit interface and user interactions.
- config.py: Contains configuration constants, mapping dictionaries for neighborhood normalization, and feature definitions.
- data_manager.py: Handles data loading, cleaning, and preparation. It manages interactions with the GeoJSON files and the pickled machine learning models.
- visualization.py: Contains the logic for generating Plotly charts, including the map and Partial Dependence Plots.
- parser.py: A standalone script used to parse raw HTML files from the scraping phase and convert them into a structured Excel dataset.
- Data Collection: Raw HTML files containing real estate listings are processed.
- Parsing: The
parser.pyscript extracts relevant features (price, location, amenities) and handles data cleaning (removing nulls, normalizing neighborhood names). - Storage: Processed data is stored in
pisos.xlsx. - Modeling: An XGBoost regressor is trained on this data to generate the
modelos.pklfile used for inference.
- Python 3.8+
- pip
Install the required packages:
pip install streamlit numpy pandas geopandas matplotlib xgboost scikit-learn plotly openpyxl beautifulsoup4To launch the dashboard locally:
streamlit run app.pyThe application will open in your default web browser at http://localhost:8501.
If you have raw HTML files in a webs/ directory and wish to regenerate the dataset:
python parser.pyTo run this project successfully, ensure the following assets are present in the root directory:
barrios.geojson: Geospatial data for Valencia neighborhoods.modelos.pkl: The pre-trained dictionary of XGBoost models.pisos.xlsx: The dataset containing rental listing information.
- Hugo Sánchez
- María Verdú
Project created in June 2024.