This project focuses on predicting housing prices in California districts using machine learning. The goal is to build a regression model that can estimate the median house value based on various features. The dataset used for this project is the California Housing Prices dataset from Kaggle.
- Project Overview
- Dataset
- Installation
- Usage
- Data Exploration
- Data Preprocessing
- Model Building
- Model Evaluation
- Results
- Contributing
- License
- Dataset Source: California Housing Prices on Kaggle
- Description: This dataset contains housing-related information for various districts in California. It includes features like population, median income, housing median age, and the target variable, median house value.
- Clone this repository to your local machine using
git clone. - Navigate to the project directory.
- Install the required Python packages using
pip install -r requirements.txt.
- Launch Jupyter Notebook: Run
jupyter notebookin the project directory. - Open and run the
Predictor.ipynbnotebook to explore the project.
- Explore the dataset using Python and Jupyter Notebook.
- Generate histograms, scatter plots, and correlation matrices to gain insights into the data.
\nThis is a heat map, showing the corelation each columns has with each other
\nThis is the histogram (similar to the one shown earlier) showing the data distribution
\nThis is a scatter plot, makes it simple to spot outliers in the dataset

- Handle missing data using imputation.
- Perform feature engineering to create new informative features.
- Scale the data to prepare it for modeling.
- Build a Linear Regression model using scikit-learn.
- Train the model on the training dataset.
- Evaluate the model's performance using various metrics.
- Calculate evaluation metrics, including Mean Absolute Error (MAE), Mean Squared Error (MSE), and Root Mean Squared Error (RMSE).
- Visualize model predictions and compare them to actual values.
\nScatter plot
\nResidual plot
\nFeature importance plot

- Summarize key findings and insights from the project.
- Discuss the model's performance and any improvements achieved through model refinement.
- The resultant was calculated based on the following parameters Mean Absolute Error: 0.4367338817223555 Mean Squared Error: 0.3603952607354783 Root Mean Squared Error: 0.6003292935843446
- this values are very average for a model of this type, to achive more suposticated results i will be refining and rewriting parts of the code to ensure maxixmum accuracy
Contributions are welcome! Feel free to open issues or submit pull requests.
This project is licensed under the MIT License - see the LICENSE file for details.



