Skip to content

seandhan/Insurance-Business-Statistics

Repository files navigation

          

Insurance Business Statistics 🛡️☔📋🏥💲

The primary objective of this project is to analyse data is to uncover relationships between policyholder demographics, risk factors, and incurred medical costs.


📝 Table of Contents

🤓 Description

This dataset provides a snapshot of medical insurance charges and relevant policyholder information from an insurance provider in the U.S. It contains records for over 6,000 beneficiaries, with key variables including age, gender, body mass index (BMI), number of dependent children covered, smoking status, geographic region of residence, and individual insurance charges.

The purpose of analysing this data is to uncover relationships between policyholder demographics, risk factors, and incurred medical costs. The charges variable represents the total dollar amounts billed to insurance for each beneficiary, which serves as a proxy for their healthcare expenditures. Exploring this variable in relation to factors like age, BMI, tobacco usage, dependents, and location can provide insights into how these attributes impact insurance claims. This can help inform actuarial practises around customised pricing and risk management. Additionally, understanding the drivers of medical costs can aid in the development of targeted wellness initiatives. Overall, careful examination of this multi-faceted dataset can assist the insurance company with financial forecasting, pricing optimisation, risk mitigation, and population health management.

Objectives

  1. Do smokers file higher insurance charges compared to non-smokers?
  2. Does the BMI of females differ from the BMI of males?
  3. Is the proportion of smokers significantly different across different regions?
  4. Is the mean BMI of women with no children, one child, and two children the same?

💻 Dataset Overview

The dataset source file can found through the following link:

Click to view 👇:

Data_link

The cardio fitness dataset contains 9 variables. The data dictionary and key observations are shown below:

Data Dictionary
  1. Age - This is an integer indicating the age of the primary beneficiary (excluding those above 64 years, since they are generally covered by the government).
  2. Sex - This is the policy holder's gender, either male or female.
  3. BMI - This is the body mass index (BMI), which provides a sense of how over or under-weight a person is relative to their height.
  4. Children - This is an integer indicating the number of children / dependents covered by the insurance plan.
  5. Smoker - This is yes or no depending on whether the insured regularly smokes tobacco.
  6. Region - This is the beneficiary's place of residence in the U.S., divided into four geographic regions - northeast, southeast, southwest, or northwest.
  7. Charges - Individual medical costs billed by health insurance

📊 Exploratory Data Analysis

Key Observations
  • The dataset is well-structured with 1338 rows and 7 columns, including categorical and numerical variables.
  • All columns have complete data with no missing values.

Click to view 👇:

Exploratory Data Analysis


🚀 Business Questions

The following business questions will be addressed in the analysis:

  1. Do smokers file higher insurance charges compared to non-smokers?
  2. Does the BMI of females differ from the BMI of males?
  3. Is the proportion of smokers significantly different across different regions?
  4. Is the mean BMI of women with no children, one child, and two children the same?

Click to view 👇:

Solution-Business Questions


📗 Notebooks

The Notebook for the "Exploratory Data Analysis" can be accessed below:

Click to view 👇:

Exploratory Data Analysis

The Notebook for the "Business Questions" can be accessed below:

Business Questions


📧 Contact Information