This repository contains my completed coursework for Machine Learning Assignment 1. The focus is on building and evaluating supervised learning models from scratch using real-world datasets.
From fitting regression curves to tuning classification algorithms, this project demonstrates my ability to handle data preprocessing, apply core ML algorithms, and assess model performance with industry-standard metrics β all implemented in Python using the scikit-learn.
β
Apply linear and polynomial regression to numerical data
β
Use GridSearchCV for hyperparameter optimization
β
Preprocess datasets: handle missing values, normalize, and encode features
β
Train and evaluate classifiers: Logistic Regression, KNN, Naive Bayes
β
Compare models using metrics like Accuracy, F1-score, Recall, Precision
β
Use pipelines to ensure clean, modular, and reproducible ML workflows
The solution is presented in the Notebook
- Load and split dataset from
task1_data.csv - Train and evaluate linear regression model using:
- MSE, RMSE, MAE, RΒ²
- Perform polynomial regression and select optimal degree using GridSearchCV
- Visualize and compare model performance
- Load
pokemon_modified.csvand perform:- Missing value imputation
- One-hot encoding for categorical features
- Feature scaling
- Train and tune:
- Logistic Regression
- K-Nearest Neighbors (KNN)
- Naive Bayes
- Evaluate models using:
- Accuracy, Precision, Recall, F1-score
- Select best-performing model
- Python (Jupyter Notebook)
numpy,pandas,matplotlib,scikit-learn- Data preprocessing pipelines
- Cross-validation with
GridSearchCV - Classification and regression metrics
Valeria Neganova
Focus: Practical supervised learning, evaluation & model selection
π« Valerochka.neganova@mail.ru