Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added Health Insurance Price Prediction using Machine Learning (Issue #697) #753

Merged
merged 1 commit into from
Jul 31, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view

Large diffs are not rendered by default.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
57 changes: 57 additions & 0 deletions Finacial Domain/Health Insurance Price Prediction/app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
import streamlit as st
import pandas as pd
import joblib

# Load the trained model and label encoders
model_lr = joblib.load('linear_regression_model.pkl')
label_encoders = joblib.load('label_encoders.pkl')


# Function to predict insurance charges
def predict_insurance_charges(age, sex, bmi, children, smoker, region):
# Transform categorical variables using label encoders
sex_encoded = label_encoders['sex'].transform([sex])[0]
smoker_encoded = label_encoders['smoker'].transform([smoker])[0]
region_encoded = label_encoders['region'].transform([region])[0]

# Prepare input data as DataFrame
input_data = pd.DataFrame({
'age': [age],
'sex': [sex_encoded],
'bmi': [bmi],
'children': [children],
'smoker': [smoker_encoded],
'region': [region_encoded]
})

# Make prediction using the trained Linear Regression model
predicted_charge = model_lr.predict(input_data)[0]

return predicted_charge


# Streamlit app
def main():
st.title('Health Insurance Price Prediction')
st.markdown('Enter the following details to predict insurance charges:')

# Input fields
age = st.number_input('Age', min_value=0, max_value=100, step=1)
sex = st.selectbox('Sex', ['male', 'female'])
bmi = st.number_input('BMI', min_value=10.0, max_value=50.0, step=0.1)
children = st.number_input('Number of Children', min_value=0, max_value=10, step=1)
smoker = st.selectbox('Smoker', ['yes', 'no'])
region = st.selectbox('Region of India', ['northeast', 'northwest', 'southeast', 'southwest'])

if st.button('Predict'):
# Call prediction function
predicted_charge = predict_insurance_charges(age, sex, bmi, children, smoker, region)

# Display prediction result in a green container with bold text
st.markdown(
f'<div style="background-color:#00FF00; padding:10px; border-radius:10px;"><h2 style="color:black; text-align:center;">Predicted Insurance Charge: <b>{predicted_charge:.2f}Rs</b></h2></div>',
unsafe_allow_html=True)


if __name__ == '__main__':
main()
135 changes: 135 additions & 0 deletions Finacial Domain/Health Insurance Price Prediction/readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
# Health Insurance Price Prediction using Machine Learning

Project Summary :

Data Exploration and Preprocessing:

Dataset: https://www.kaggle.com/datasets/annetxu/health-insurance-cost-prediction

Exploratory Data Analysis (EDA):

Utilized Plotly and Seaborn for visualizations including pie charts, histograms, violin plots, and box plots to understand data distributions, correlations, and outliers.

Data Preprocessing:

Label Encoding: Converted categorical variables (sex, smoker, region) into numerical format using LabelEncoder from scikit-learn.
Handling Missing Values: Ensured data completeness by checking for and handling missing values appropriately.
Normalization: Used StandardScaler from scikit-learn for feature scaling where applicable.


Machine Learning Models:

Linear Regression (LR):

Trained a Linear Regression model to predict insurance charges based on features such as age, BMI, and others.
Evaluated using metrics like R-squared (accuracy), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Mean Absolute Percentage Error (MAPE).


Random Forest (RF):

Applied a Random Forest Regressor for prediction.
Evaluated performance metrics similar to LR.


XGBoost (XGB) and Gradient Boosting Machine (GBM):

Implemented XGBoost and GBM models for comparison.
Evaluated and compared their performance metrics with LR and RF.


Model Evaluation and Comparison:

Compared the performance of LR, RF, XGB, and GBM using metrics such as R-squared, MSE, RMSE, and MAPE.
Visualized the actual vs. predicted values using line plots and evaluated the accuracy across different models.


Deployment with Streamlit:

Developed a Streamlit web application for predicting insurance charges based on user inputs (age, sex, BMI, children, smoker, region).
Integrated the trained LR model and label encoders into the Streamlit app.
Provided a user-friendly interface where users can input their data and get the predicted insurance charge displayed in a visually appealing green container with bold text.


Future Directions:


Model Improvement: Fine-tuning models for better accuracy, exploring ensemble techniques or deep learning approaches if needed.
Feature Engineering: Further exploring feature interactions or transformations to enhance model performance.
User Experience: Improving the UI/UX of the Streamlit app, adding more features such as data visualization options and model selection.


Tools and Technologies Used:

Programming Languages: Python
Libraries and Frameworks: pandas, NumPy, scikit-learn, XGBoost, Plotly, Seaborn, Streamlit
Data Visualization: Plotly, Seaborn for interactive and insightful visualizations.
Machine Learning: Regression models (Linear Regression, Random Forest, XGBoost, GBM) for predictive analysis.
Web Application Development: Streamlit for creating interactive and user-friendly web applications.

Conclusion:

The project revolves around leveraging machine learning techniques to predict insurance charges based on various customer attributes. The journey has included data exploration, preprocessing, model building, evaluation, and deployment using modern tools and frameworks. This structured approach ensures robust predictions and a seamless user experience through the Streamlit application.

## How to Use

1. **Clone the Repository**:
```sh
git clone url_to_this_repository
```

2. **Install Dependencies**:
```sh
pip install -r requirements.txt
```

3. **Run the Model**:
```python
streamlit run main.py
```

4. **View Results**: The script will allow you to predict the estimated cost of health insurance for a person













































Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
scikit-learn==1.2.2
joblib==1.4.2
pandas==2.0.3
numpy==1.25.2
streamlit
Loading