A collection of machine learning assignments covering fundamental and advanced ML concepts, implemented as part of the Aprendizagem (Learning) course at IST.
This repository contains solutions to four comprehensive homework assignments that progressively build expertise in machine learning theory and practice. Each assignment combines theoretical pen-and-paper exercises with practical Python implementations.
machine_learning_hws/
├── homework_01/
│ ├── G{xxx}_report.pdf
│ └── G{xxx}_notebook.ipynb
├── homework_02/
│ ├── G{xxx}_report.pdf
│ └── G{xxx}_notebook.ipynb
├── homework_03/
│ ├── G{xxx}_report.pdf
│ └── G{xxx}_notebook.ipynb
├── homework_04/
│ ├── G{xxx}_report.pdf
│ └── G{xxx}_notebook.ipynb
└── README.md
Deadline: October 1, 2025
Pen-and-Paper (11 points):
- Building decision trees using information gain (Shannon entropy)
- Training confusion matrices and F1 scores
- Class-conditional histograms and discriminant rules
- Outlier detection
Programming (9 points):
- Decision tree implementation with various hyperparameters
- Model generalization analysis
- Hyperparameter tuning for healthcare applications
- Feature importance and conditional associations
Dataset: Hungarian Heart Diseases (284 patients, 9 biological features)
Deadline: October 8, 2025
Pen-and-Paper (13 points):
- Bayesian classifier implementation with MAP assumption
- Independent variable assumptions and normal distributions
- k-NN with Hamming distance and leave-one-out evaluation
- Theoretical bounds on 1-NN classifier error rates
Programming (7 points):
- k-NN vs Naïve Bayes comparison with 5-fold cross-validation
- Impact of data preprocessing (Min-Max scaling)
- Statistical significance testing
- Hyperparameter optimization (number of neighbors, weighting schemes)
- Model deployment considerations for clinical settings
Dataset: Breast Cancer Dataset
Deadline: October 15, 2025
Pen-and-Paper (12 points):
- Ordinary Least Squares (OLS) regression with polynomial basis functions
- Ridge regression with regularization (λ = 1)
- Training and test MAE comparison
- Backpropagation through Multi-Layer Perceptron (MLP)
- Stochastic gradient descent updates
- Activation function analysis (sigmoid vs no activation)
Programming (8 points):
- Linear regression baseline
- MLP regressors with varying architectures
- Impact of activation functions (ReLU) on model performance
- Overfitting vs underfitting analysis
- 5-fold cross-validation
Dataset: Rent prediction dataset
Deadline: October 23, 2025
Pen-and-Paper (9 points):
- K-means clustering algorithm implementation
- Centroid initialization impact analysis
- Principal Component Analysis (PCA) covariance computation
- Projection plane determination
- Class discrimination analysis
Programming (11 points):
- K-means with elbow method (SSE analysis)
- Clustering for classification tasks
- Confusion matrices and performance metrics
- PCA for variance explanation
- Linear Discriminant Analysis (LDA)
- PCA vs LDA comparison for discriminant rules
Dataset: Diabetes prediction dataset
- Python 3.x
- NumPy - Numerical computations
- Pandas - Data manipulation
- scikit-learn - Machine learning models and preprocessing
- Matplotlib/Seaborn - Data visualization
- SciPy - Statistical testing
- Jupyter Notebook - Interactive development
- Decision Trees
- Bayesian Classifiers (Naïve Bayes)
- k-Nearest Neighbors (k-NN)
- Linear Regression
- Ridge Regression
- Multi-Layer Perceptrons (MLPs)
- K-means Clustering
- Principal Component Analysis (PCA)
- Linear Discriminant Analysis (LDA)
- Cross-validation (k-fold, leave-one-out, stratified)
- Confusion matrices
- Performance metrics (Accuracy, Precision, Recall, F1-score, MAE)
- Statistical significance testing
- Overfitting/underfitting analysis
- Regularization techniques
- Backpropagation
- Activation functions
- Feature engineering
- Hyperparameter tuning
- Model interpretation
pip install numpy pandas scikit-learn matplotlib seaborn scipy jupyterjupyter notebook homework_0X/G{xxx}_notebook.ipynb