Skip to content

tomasmajercik/ML-basics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Introduction to ML basics: set of 3 projects (evolutionary algorithm, clustering and basic neural network design)

TL;DR

I completed 6 hands-on projects during an Introduction to Machine Learning course.

  1. An evolutionary algorithm that evolved programs to collect treasures on a grid in as few steps as possible.
  2. Clustering
    • k-means clustering using centroids to group 40,000+ spatially biased points into tight clusters
    • k-means clustering using medoids (more robust to outliers),
    • divisive hierarchical clustering, also using centroids.
  3. Neural networks
    • Built a multilayer perceptron in PyTorch to classify handwritten digits with over 97% accuracy, testing different optimizers (SGD, momentum, Adam).
    • I implement backpropagation from scratch - fully functional backprop in NumPy, used to train a network to solve XOR with modular layers and manual gradient updates

📝 Project description

🧬 Project 1: Evolutionary Algorithm - Treasure Hunt

This project was the first part of a three-part school course introducing machine learning principles. It focused on evolutionary algorithms applied to a simple, gamified problem: finding the best set of instructions for a player to collect all treasures on a 7x7 grid map using as few steps as possible. The map contained a starting point, five hidden treasures, and empty spaces. The player could move in four directions (up, down, left, right), and every time they landed on a treasure, it was added to their score. The challenge was to evolve a set of "programs" (sequences of moves) that would maximize collected treasures while minimizing steps.

The solution used a classic genetic algorithm, including: random population initialization, fitness evaluation based on number of treasures and steps, selection of top individuals, tournament-style crossover, and basic mutation.

Once no improvement was observed after a fixed number of generations, the algorithm stopped and printed the best-performing solution:

===========================
The best program ended with fitness value of 5.064
It took 18 steps
And found 5 treasures
The map:
['o', 'o', 'o', 'o', 'o', 'o', 'o']
['o', 'o', 'o', 'o', 'x', 'o', 'o']
['o', 'o', 'x', 'x', 'x', 'x', 'x']
['o', 'o', 'x', 'o', 'o', 'o', 'x']
['o', 'x', 'x', 'o', 'o', 'o', 'o']
['o', 'x', 'x', 'x', 'x', 'o', 'o']
['o', 'o', 'x', 'x', 'o', 'o', 'o']
The moves:
['L', 'H', 'P', 'P', 'L', 'L', 'L', 'H', 'P', 'H', 'H', 'P', 'P', 'H', 'D', 'P', 'P', 'D', 'D', 'L', 'D', 'P']
===========================

⠍⠇ Project 2: Clustering

This was the second part of our school course on machine learning, focusing on unsupervised learning - specifically, clustering algorithms in a simulated 2D space. We started by generating a 2D plane of size [-5000, 5000] in both axes, populated with 20 initial seed points, placed at random but unique coordinates, and 40,000 additional points that were generated with a bias. The core task was to analyze this large 2D space and implement multiple clustering algorithms to detect the hidden groupings:

  • k-means clustering using centroids,
  • k-means clustering using medoids (more robust to outliers),
  • divisive hierarchical clustering, also using centroids.

The main challenge was balancing algorithm efficiency with clustering quality, especially on a dataset this large. Each method was evaluated based on the average intra-cluster distance, and only clusters with average distance under 500 were considered successful. I implemented all three clustering methods from scratch and added visualizations to plot the clustered data points, color-coded and optionally labeled. This helped verify not just correctness, but also provided intuition for how each algorithm behaves.

Photo

Photo

Photo

🧠 Project 3: Neural networks

MNIST - digits recognizing

The final stage of the course brought everything together with a hands-on classification task: recognizing handwritten digits from the MNIST dataset using feedforward neural networks (MLP). The network was trained on 60,000 grayscale 28×28 pixel images and evaluated on 10,000 test samples. The architecture was built in PyTorch, and I explored multiple optimization strategies: SGD (Stochastic Gradient Descent), SGD with momentum, and the ADAM optimizer. The project emphasized: dataset preprocessing and normalization, tuning hyperparameters like learning rate, batch size, and hidden layer sizes, visualizing training progress and confusion matrices, and evaluating generalization performance across optimizers. This was my first real classification task using a deep learning framework, and it helped me understand both the practical training process and model evaluation in a controlled setting.

XOR problem

As a side challenge to reinforce theoretical understanding, we were asked to manually implement backpropagation using only NumPy, with no automatic differentiation libraries allowed. I created a modular architecture with layers like: linear (fully connected), activation functions (ReLU, sigmoid, tanh), and MSE loss. Each module supported both forward() and backward() passes, and parameter updates were performed either with vanilla SGD or with momentum. This project helped demystify how gradients actually flow through a network, how layers interact, and how parameter updates gradually reduce error — without relying on high-level libraries. It gave me a deeper appreciation of what libraries like PyTorch do under the hood.

================================================
Epoch 100, loss: 0.16732414845108892 # not very confident
0 0 | [0.12886275] → ([0.])
0 1 | [0.55949593] → ([1.])
1 0 | [0.65761626] → ([1.])
1 1 | [0.58431201] → ([1.])
================================================
Epoch 500, loss: 0.0017811014684536426 # absolutely confident
0 0 | [0.00359476] → ([0.])
0 1 | [0.94166895] → ([1.])
1 0 | [0.93956583] → ([1.])
1 1 | [0.00752886] → ([0.])
================================================

🌱 Skills gained & problems overcomed

  • Core Machine Learning Concepts
  • Implemented ML algorithms manually
  • Optimization Techniques
  • Neural Network Fundamentals
  • Data Preprocessing and Representation
  • Visualization and Evaluation

⚙️ How to run

1. Genetic algorithm

cd evolutionary-algorithm/
python main.py

2. Clustering

  1. Centroid k means clustering
cd centroid-k-means/
python main.py
  1. Divisive clustering
cd divisive-clustering/
python main.py
  1. Medoid k means clustering
cd medoid-k-means/
python main.py

3a. MNIST

cd NN-basics/number-recognition/
python main.py

3b. XOR problem

cd NN-basics/xor-problem/
python main.py

Releases

No releases published

Packages

No packages published

Languages