Skip to content

pedroMVicente/machine_learning_hws

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Machine Learning Homeworks - Aprendizagem 2025/26

A collection of machine learning assignments covering fundamental and advanced ML concepts, implemented as part of the Aprendizagem (Learning) course at IST.

Course Overview

This repository contains solutions to four comprehensive homework assignments that progressively build expertise in machine learning theory and practice. Each assignment combines theoretical pen-and-paper exercises with practical Python implementations.

Repository Structure

machine_learning_hws/
├── homework_01/
│   ├── G{xxx}_report.pdf
│   └── G{xxx}_notebook.ipynb
├── homework_02/
│   ├── G{xxx}_report.pdf
│   └── G{xxx}_notebook.ipynb
├── homework_03/
│   ├── G{xxx}_report.pdf
│   └── G{xxx}_notebook.ipynb
├── homework_04/
│   ├── G{xxx}_report.pdf
│   └── G{xxx}_notebook.ipynb
└── README.md

Homework Topics

Homework 1: Decision Trees & Classification Fundamentals

Deadline: October 1, 2025

Pen-and-Paper (11 points):

  • Building decision trees using information gain (Shannon entropy)
  • Training confusion matrices and F1 scores
  • Class-conditional histograms and discriminant rules
  • Outlier detection

Programming (9 points):

  • Decision tree implementation with various hyperparameters
  • Model generalization analysis
  • Hyperparameter tuning for healthcare applications
  • Feature importance and conditional associations

Dataset: Hungarian Heart Diseases (284 patients, 9 biological features)


Homework 2: Bayesian Classifiers & k-Nearest Neighbors

Deadline: October 8, 2025

Pen-and-Paper (13 points):

  • Bayesian classifier implementation with MAP assumption
  • Independent variable assumptions and normal distributions
  • k-NN with Hamming distance and leave-one-out evaluation
  • Theoretical bounds on 1-NN classifier error rates

Programming (7 points):

  • k-NN vs Naïve Bayes comparison with 5-fold cross-validation
  • Impact of data preprocessing (Min-Max scaling)
  • Statistical significance testing
  • Hyperparameter optimization (number of neighbors, weighting schemes)
  • Model deployment considerations for clinical settings

Dataset: Breast Cancer Dataset


Homework 3: Regression & Neural Networks

Deadline: October 15, 2025

Pen-and-Paper (12 points):

  • Ordinary Least Squares (OLS) regression with polynomial basis functions
  • Ridge regression with regularization (λ = 1)
  • Training and test MAE comparison
  • Backpropagation through Multi-Layer Perceptron (MLP)
  • Stochastic gradient descent updates
  • Activation function analysis (sigmoid vs no activation)

Programming (8 points):

  • Linear regression baseline
  • MLP regressors with varying architectures
  • Impact of activation functions (ReLU) on model performance
  • Overfitting vs underfitting analysis
  • 5-fold cross-validation

Dataset: Rent prediction dataset


Homework 4: Clustering & Dimensionality Reduction

Deadline: October 23, 2025

Pen-and-Paper (9 points):

  • K-means clustering algorithm implementation
  • Centroid initialization impact analysis
  • Principal Component Analysis (PCA) covariance computation
  • Projection plane determination
  • Class discrimination analysis

Programming (11 points):

  • K-means with elbow method (SSE analysis)
  • Clustering for classification tasks
  • Confusion matrices and performance metrics
  • PCA for variance explanation
  • Linear Discriminant Analysis (LDA)
  • PCA vs LDA comparison for discriminant rules

Dataset: Diabetes prediction dataset

Technologies & Libraries

  • Python 3.x
  • NumPy - Numerical computations
  • Pandas - Data manipulation
  • scikit-learn - Machine learning models and preprocessing
  • Matplotlib/Seaborn - Data visualization
  • SciPy - Statistical testing
  • Jupyter Notebook - Interactive development

Key Concepts Covered

Supervised Learning

  • Decision Trees
  • Bayesian Classifiers (Naïve Bayes)
  • k-Nearest Neighbors (k-NN)
  • Linear Regression
  • Ridge Regression
  • Multi-Layer Perceptrons (MLPs)

Unsupervised Learning

  • K-means Clustering
  • Principal Component Analysis (PCA)
  • Linear Discriminant Analysis (LDA)

Model Evaluation

  • Cross-validation (k-fold, leave-one-out, stratified)
  • Confusion matrices
  • Performance metrics (Accuracy, Precision, Recall, F1-score, MAE)
  • Statistical significance testing
  • Overfitting/underfitting analysis

Advanced Topics

  • Regularization techniques
  • Backpropagation
  • Activation functions
  • Feature engineering
  • Hyperparameter tuning
  • Model interpretation

Getting Started

Prerequisites

pip install numpy pandas scikit-learn matplotlib seaborn scipy jupyter

Running the Notebooks

jupyter notebook homework_0X/G{xxx}_notebook.ipynb

About

Machine Learning coursework from IST's Aprendizagem 2025/26. Covers decision trees, Bayesian classifiers, k-NN, linear/ridge regression, MLPs with backpropagation, k-means clustering, PCA, and LDA. Includes theoretical exercises and Python implementations using scikit-learn.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors