Skip to content

Hajer603/Data_Science_Project

Repository files navigation

Data_Science_Project

🏠 Real Estate Price Prediction Project (Oman)

πŸ“Œ Objective

The purpose of this project is to collect and analyze real estate listings in Oman, clean the data, engineer useful features, and build machine learning models to predict property sale prices based on key property characteristics.


🌐 Websites Used


πŸ”„ Data Collection & Cleaning Steps

  1. Web Scraping:

    • Used Python with requests, BeautifulSoup, and pandas to scrape listing data.
    • Extracted features: property title, city, area, price, number of bedrooms, bathrooms, garage, and listing type.
  2. Data Cleaning:

    • Removed text units (e.g., "OMR", "SqM") from price and area columns using regex.
    • Converted numeric columns to floats.
    • Filled missing values using median (for numeric data) or mode (for categorical data).
    • Dropped rows with excessive missing information.

πŸ› οΈ Feature Engineering Strategy

  • Created new columns such as price_per_sqm, city,government,total rooms based on property size.
  • Applied label encoding for categorical features (e.g., location,city, listing type and government).
  • Normalized numerical features where needed.
  • Combined data from two sources into one unified dataset.

πŸ€– Modeling Approach

Used Scikit-learn to apply and evaluate different regression models:

  • Linear Regression
  • Decision Tree Regressor
  • Random Forest Regressor

Evaluation Metrics:

  • Mean Squared Error (MSE)
  • Root Mean Squared Error (RMSE)
  • R-squared Score (RΒ²)

πŸ“Š Results Summary

Model RMSE RΒ² Score
Linear Regression 0.10 0.43
Decision Tree 0.02 0.98
Random Forest 0.01 0.99

βœ… Random Forest performed the best, achieving the highest RΒ² and lowest RMSE.


πŸ“‚ Project Files

β€’ Web scraping scripts or notebooks β€’ Data cleaning functions β€’ Final combined CSV file β€’ Feature engineering and modeling code β€’ A brief README.md file


πŸš€ Future Enhancements

  • Add more features like amenities, location coordinates, or property age.
  • Scrape additional websites to expand the dataset.
  • Deploy the model as a prediction tool via Flask or Streamlit.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published