Your first implementation of a machine learning algorithm.
This project is about implementing a simple linear regression algorithm. The goal is to predict the price of a car based on its mileage.
To do this we will use the following formula:
Where:
-
$y$ is the price of the car (dependent variable) -
$x$ is the mileage of the car (independent variable) -
$\theta_0$ is the intercept -
$\theta_1$ is the slope
To figure out the values of
git clone https://github.com/magnitopic/ft-linear-regression.git
cd ft-linear-regression
pip install -r requirements.txt
python src/main.py
Cost function to minimize:
To minimize the cost, we need the partial derivative of the cost function.
The gradient is:
$\nabla J = \left( \frac{\partial J}{\partial \theta_0} , \frac{\partial J}{\partial \theta_1} \right) $
-
$R^2 = 0$ : There is no linear relationship between the variables -
$R^2 = 1$ : There is a perfect linear relationship between the variables (all points are on the line) -
$R^2 \approx 0.8$ : Good fit in many contexts -
$R^2 < 0.2$ : Poor fit, suggests non-linear relationship or no relationship at all
-
$\sum_{i=1}^{n} (Y_i - \widehat{Y}_i)^2$ calculates the sum of squared residuals (difference between observed and predicted values). -
$\sum_{i=1}^{n} (Y_i - \bar{Y})^2$ calculates the total sum of squares (total variation in the data).
It's important in this statistics example to remember that:
- Correlation does not imply causation
- There may exist false correlations
- The value of
$R^2$ should be interpreted with the specific context in mind- In social sciences, a value of
$R^2$ of 0.7 is considered high - In physics, a value of
$R^2$ of 0.7 is considered low
- In social sciences, a value of
