ft-linear-regression

Your first implementation of a machine learning algorithm.

Description

This project is about implementing a simple linear regression algorithm. The goal is to predict the price of a car based on its mileage.

To do this we will use the following formula:

$$ y = \theta_0 + \theta_1 \cdot x$$

Where:

$y$ is the price of the car (dependent variable)
$x$ is the mileage of the car (independent variable)
$\theta_0$ is the intercept
$\theta_1$ is the slope

To figure out the values of $\theta_0$ and $\theta_1$ we will use the gradient descent algorithm.

Clone and run project

git clone https://github.com/magnitopic/ft-linear-regression.git

cd ft-linear-regression

pip install -r requirements.txt

python src/main.py

Math

$$ y = \theta_0 + \theta_1 \cdot x$$

$$ERROR = \frac{1}{m} \sum_{i=1}^{m} (\widehat{y}-y)^2$$

Cost function to minimize:

$$J = \frac{1}{2m} \sum_{i=1}^{m} (\widehat{y}-y)^2$$

$$J = \frac{1}{2m} \sum_{i=1}^{m} (\widehat{y}-(\theta_0 + \theta_1 \cdot x))^2$$

To minimize the cost, we need the partial derivative of the cost function.

The gradient is: $\nabla J = \left( \frac{\partial J}{\partial \theta_0} , \frac{\partial J}{\partial \theta_1} \right) $

$$\frac{\partial J}{\partial \theta_0} = \frac{-1}{m} \sum_{i=1}^{m} (\widehat{y}-y)$$

$$\frac{\partial J}{\partial \theta_1} = \frac{-1}{m} \sum_{i=1}^{m} (\widehat{y}-y) \cdot x$$

$$ \theta_0 := \alpha \cdot \left( \frac{\partial J}{\partial \theta_0} \right) = \alpha \left( \frac{-1}{m} \sum(\widehat{y} - y) \right) $$

$$ \theta_1 := \alpha \cdot \left( \frac{\partial J}{\partial \theta_1} \right) = \alpha \left( \frac{-1}{m} \sum x\cdot(\widehat{y} - y) \right) $$

Evaluation of the Model: Coefficient of Determination ($R^2$)

$R^2$ is a metric that varies between 0 and 1, where:

$R^2 = 0$: There is no linear relationship between the variables
$R^2 = 1$: There is a perfect linear relationship between the variables (all points are on the line)
$R^2 \approx 0.8$: Good fit in many contexts
$R^2 < 0.2$: Poor fit, suggests non-linear relationship or no relationship at all

$$R^2 = 1 - \frac{\sum_{i=1}^{n} (Y_i - \widehat{Y}_i)^2}{\sum_{i=1}^{n} (Y_i - \bar{Y})^2}$$

$\sum_{i=1}^{n} (Y_i - \widehat{Y}_i)^2$ calculates the sum of squared residuals (difference between observed and predicted values).
$\sum_{i=1}^{n} (Y_i - \bar{Y})^2$ calculates the total sum of squares (total variation in the data).

Correlation vs Causation

It's important in this statistics example to remember that:

Correlation does not imply causation
There may exist false correlations

As shown by this website
The value of $R^2$ should be interpreted with the specific context in mind
- In social sciences, a value of $R^2$ of 0.7 is considered high
- In physics, a value of $R^2$ of 0.7 is considered low

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
data		data
src		src
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ft-linear-regression

Description

Clone and run project

Math

Evaluation of the Model: Coefficient of Determination ($R^2$)

Correlation vs Causation

About

Uh oh!

Uh oh!

Languages

magnitopic/ft-linear-regression

Folders and files

Latest commit

History

Repository files navigation

ft-linear-regression

Description

Clone and run project

Math

Evaluation of the Model: Coefficient of Determination ($R^2$)

Correlation vs Causation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages