-
Notifications
You must be signed in to change notification settings - Fork 33
Description
The current machine learning script at backend/scripts/MachineLearning/prediction.py successfully trains a linear regression model and saves it. However, it does not evaluate the performance of the model. A critical step in any machine learning workflow is to understand how well the model is performing on unseen data.
This task involves modifying the script to include model evaluation.
Proposed Solution:
We need to update the script to split the dataset into training and testing sets. The model will be trained on the training set and then evaluated on the testing set using standard regression metrics.
Tasks:
Import necessary functions: From sklearn.model_selection, import train_test_split. From sklearn.metrics, import mean_squared_error and r2_score.
Split the data: Use the train_test_split function to divide the existing X and y data into X_train, X_test, y_train, and y_test. A common split is 80% for training and 20% for testing.
Train the model: Fit the LinearRegression model using only the training data (X_train, y_train).
Make predictions: Use the trained model to make predictions on the test data (X_test).
Calculate and print evaluation metrics:
Calculate the Mean Squared Error (MSE) between the predictions and the actual test values (y_test).
Calculate the R-squared (R2) score.
Print these scores to the console in a clear and understandable format.
Acceptance Criteria:
The prediction.py script uses train_test_split to create training and testing datasets.
The model is trained exclusively on the training data.
After training, the script calculates and prints at least two evaluation metrics (MSE and R 2 score) based on the test data.
The script still successfully saves the trained model as linear_regression_model.joblib