Datascience_Project

[Real Estate Data ScrapingCleaning, and Analysis]

1-Objectives

The main goal of this project is to collect, clean, and analyze real estate property listings from leading property websites in Oman. The processed dataset can later be used for data analysis or building predictive models (such as predicting rental prices).

2-Websites Used

*Data Collection Sources:

opensooq.com bayut.om

*Reference & Coding Help:

geeksforgeeks chatgpt khoula repositori --> Content folder youtube

3-Steps Taken:

1. Data Collection (Web Scraping)

Wrote custom Python scripts using requests and BeautifulSoup to scrape property listing data from both websites.

Handled pagination to collect data from all available listing pages.

Saved the raw data into structured CSV files for each website.

2. Data Cleaning

Combined the data from both sources into a single DataFrame.

Cleaned column names and ensured consistent data types across columns.

Removed duplicates and handled missing values (e.g., filled missing sizes with the median value, locations from titles, etc.).

Split combined fields (such as "80 m2") into separate numeric and unit columns.

Standardized currency values.

3. Feature Engineering

Created new features:

Price_per_SqM: Price divided by size (when available).

.Feature scaling: Applied normalization (e.g., MinMaxScaler) and Box-Cox transformations to numeric features.

.Categorical Encoding: Used one-hot encoding for variables like Listing_Type.

.Ensured all features were ready for further analysis or modeling.

4. Modeling Approach (if applicable)

.(Optional) Built simple regression models to predict price based on available features.

.Evaluated model performance using standard metrics, with R2: 0.86

Usage

Run the Jupyter notebook(s) to reproduce the scraping, cleaning, and feature engineering steps.

The final cleaned dataset can be found in properties_combined_raw2.csv.

sklearn.ensemble

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
Web.Scraping_project.ipynb		Web.Scraping_project.ipynb
data_cleaningproj.ipynb		data_cleaningproj.ipynb
feature_engineeringproj.ipynb		feature_engineeringproj.ipynb
properties_combined_raw2.csv		properties_combined_raw2.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Datascience_Project

[Real Estate Data ScrapingCleaning, and Analysis]

1-Objectives

2-Websites Used

*Data Collection Sources:

*Reference & Coding Help:

3-Steps Taken:

1. Data Collection (Web Scraping)

2. Data Cleaning

3. Feature Engineering

4. Modeling Approach (if applicable)

Usage

About

Uh oh!

Releases

Packages

Languages

shihabahmed8/Datascience_Project

Folders and files

Latest commit

History

Repository files navigation

Datascience_Project

[Real Estate Data ScrapingCleaning, and Analysis]

1-Objectives

2-Websites Used

*Data Collection Sources:

*Reference & Coding Help:

3-Steps Taken:

1. Data Collection (Web Scraping)

2. Data Cleaning

3. Feature Engineering

4. Modeling Approach (if applicable)

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages