Skip to content

Linear Regression

Ishani Kathuria edited this page May 24, 2023 · 2 revisions

Overview

Linear regression is a method to find the straight line that best fits a set of data points. It helps us understand how one variable (dependent variable) changes as another variable (independent variable) changes. By estimating the slope and intercept of the line, we can make predictions and analyze the relationship between the variables.

Formula

y ^ = b 0 + b 1 x

y ^ represents the predicted value of the dependent variable, x represents the independent variable, and b 0 and b 1 represent the estimated intercept and slope coefficients, respectively.

b 1 = i = 1 n ( x i x ¯ ) ( y i y ¯ ) i = 1 n ( x i x ¯ ) 2

b 0 = y ¯ b 1 x ¯

Step-by-Step Implementation

A drugs dataset (Kaggle) was used with the columns as,

  • age
  • sex
  • bmi
  • children
  • smoker
  • region
  • charges (y – dependent variable)

The dataset can be used to classify what were the medical costs billed by health insurance for a particular person. There are no classes in this since this is a regression method which means the output is a continuous value. (Learn more about regression)

See implementation in Jupyter Notebook