This capstone project involves performing linear regression analysis on the Boston housing dataset (boston.csv). The goal is to build a predictive model for housing prices using independent variables such as the average number of rooms (rm) and the proportion of the population with a low status (lstat). The project includes data cleaning, exploratory data analysis (EDA), model building, and evaluation.
-
Clone the Repository
- Clone this repository to your local machine:
git clone https://github.com/vswapna3202/L2T14.git
- Clone this repository to your local machine:
-
Navigate to the Project's Folder
- Navigate to the folder containing the project:
cd L2T14 or cd <your-task-folder>
- Navigate to the folder containing the project:
-
Install Dependencies
- Ensure you have Python installed on your system. You can download it from the official Python website.
- Install Jupyter Notebook to run the provided notebook:
pip install notebook - Install necessary libraries such as
pandas,numpy,matplotlib, andscikit-learn:pip install pandas numpy matplotlib scikit-learn
-
Reading and Preparing the Data
- Read the
boston.csvfile into the Jupyter notebookCapstone Linear Regression.ipynb. - Clean and prepare the dataset if necessary.
- Read the
-
Exploratory Data Analysis (EDA)
- Perform EDA to gain insights into the dataset.
- Visualize the distributions of the dependent variable and independent variables.
- Identify any patterns or trends in the data.
-
Linear Regression Model
- Use the following independent variables:
rm(average number of rooms) andlstat(proportion of the population with low status). - Split the dataset into the independent variables and the single dependent variable.
- Generate plots to explore the relationships between the independent variables and the dependent variable.
- Use the following independent variables:
-
Model Building and Evaluation
- Split the data into training and test sets using a split ratio of 75:25.
- Build a multiple linear regression model using the training set with all the independent variables.
- Print out the intercept and coefficients of the trained model.
- Generate predictions for the test set.
- Evaluate the model's performance by computing the mean squared error (MSE) or root mean squared error (RMSE) on the test set.
- Generate an error plot to visualize the differences between the predicted and actual values in the test set.
- Print the coefficients and interpret them within the context of the median value prediction.
-
Summary
- Summarize the findings from the analysis, including insights from the exploratory data analysis, model performance, and any notable observations within the notebook.
- Start Jupyter Notebook:
jupyter notebook
- Open the Jupyter notebook
Capstone Linear Regression.ipynb.
boston.csv: Contains data on various attributes of houses in Boston, including features like average number of rooms (rm), proportion of population with low status (lstat), and median value of owner-occupied homes (medv). Source