streamlit live app link :https://multiplelinearregression-ml-deployment-with-app-a8vyz5zuerwhmd.streamlit.app/
This project demonstrates how Multiple Linear Regression can be applied to an investment dataset to analyze the impact of different investment attributes on profit and to identify statistically significant features using OLS (Ordinary Least Squares) regression.
The objective of this project is to:
-
Build a Linear Regression model to predict profit based on multiple investment factors
-
Encode categorical variables for model compatibility
-
Perform train–test splitting
-
Evaluate model performance using bias and variance
-
Apply backward elimination using p-values from OLS to select the most significant features
-
Python
-
Pandas – data handling
-
NumPy – numerical operations
-
Matplotlib – visualization support
-
Scikit-learn – machine learning model
-
Statsmodels – statistical analysis (OLS)
The dataset (Investment.csv) contains multiple investment-related attributes such as:
-
Different investment channels
-
Promotional and research spending
-
State information (categorical)
-
Profit (target variable)
-
Data Loading & Preprocessing
-
Loaded dataset using Pandas
-
Separated independent variables (X) and dependent variable (y)
-
Converted categorical data using LabelEncoder
-
Model Training
-
Split data into training and testing sets (75% / 25%)
-
Trained a Multiple Linear Regression model
-
Extracted slope (coefficients) and intercept
-
Statistical Analysis
-
Applied OLS regression
-
Added constant term manually
-
Performed backward elimination by removing features with p-value > 0.05
-
Model Evaluation
-
Bias Score: Training accuracy
-
Variance Score: Testing accuracy
-
OLS regression helps identify statistically significant investment factors
-
Features with high p-values contribute less to profit prediction
-
The refined model improves interpretability and reliability
-
Useful for data-driven investment decision-making
-
Coefficients and intercept obtained from Linear Regression
-
Final model retains only significant predictors
-
Clear distinction between training and testing performance
-
Add data visualization for feature impact
-
Deploy the model using Streamlit or Flask
-
Automate feature selection
pip install pandas numpy matplotlib scikit-learn statsmodels python app.py
This project was developed as part of hands-on learning in Machine Learning and Statistical Modeling, focusing on practical implementation and interpretability.
Based on the regression model predictions, higher investment in the Digital Marketing attribute shows a strong positive impact on future profit, making it the most effective investment driver in the dataset.