Skip to content

Application of Linear, AdaBoost, RandomForest Regressors and Ordinary Least Square method for projecting customer insurance charges.

Notifications You must be signed in to change notification settings

Shubha23/Insurance-Forecast-A-Regression-Problem

Repository files navigation

Exploratory Data Analysis and application of Regression techniques.

NOTE - Please see the Jupyter notebook(.ipynb file) for complete explanation and Exploratory Data Analysis (EDA)


Project goal - Prediction of insurance charges for customers


Problem type - Regression


Project description - This work predicts insurance charges for customers based on their several physical characteristics and some other features. Exploratory Data analysis (EDA) is performed for data overview, visualization and study of statistical details. Linear regression, ADABoost Regressor and Random Forest Regressor are implemented using Python's Scikit-learn package. All three algorithms are appied on both train and test data. Ordinary Least Square(OLS) from Python's Statsmodel is used for prediction and to generate full statistical summary including p-values, correlation coefficients and F-statistics for the model. Two approaches are used for creating train and test data from the dataset.

  1. Cross-vaidation - Using entire dataset and perform cross-validation.
  2. Train-test splitting - Splitting the dataset, 80% for training and rest for testing. Performance metrics - R-squared values for each model on both training and testing data.

About dataset-


Source - Kaggle (Medical Insurance Personal dataset) Dependent variable - charges Independent variables - Age - Age of customer (numeric) Sex - Gender of customer (categorical, String type) Region - Areas they live in (categorical, String type) Children - Number of children they have (numeric) BMI - Body-mass index (continous, float) Smoker - Whether smokes or not (categorical, String type)


To compile and execute - python Insurance.py


Output type - Continous, float values Output parameter - Cost charges projected for each customer by predictive models.


*Note - Update input file path with local file pathname.

***************************************************** End of file ***************************************************

About

Application of Linear, AdaBoost, RandomForest Regressors and Ordinary Least Square method for projecting customer insurance charges.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published