Repository Purpose

This repository is a comprehensive guide to Machine Learning, designed to bridge theoretical concepts with practical, hands-on implementations. It serves as a learning lab for anyone—from beginners to practitioners—looking to deepen their understanding of core ML foundations and algorithms.

Goals

Demystify Machine Learning through structured explanations and illustrative examples
Organize ML algorithms into key paradigms: Supervised, Unsupervised, Semi-Supervised, and Reinforcement Learning
Enable experimentation with interactive Jupyter Notebooks for real-world learning
Support understanding of mathematical concepts and simplify complex topics like optimization, statistics, and linear algebra

Repository Structure

machine-learning/
│
├── README.md                             # High-level introduction to Machine Learning
│
├── supervised/
│   ├── 00.concepts.md                       # Core concepts: labeled data, overfitting, etc.
│   ├── 01.linear_regression.md
│   ├── 02.logistic_regression.md
│   ├── 03.k_nearest_neighbors.md
│   ├── 04.naive_bayes.md
│   ├── 05.svm.md
│   ├── 06.decision_trees.md
│   ├── 07.random_forest.md
│   ├── 08.gradient_boosting.md
│   ├── 09.neural_networks.md
│   ├── algorithms/
│   └── notebooks/
│
├── unsupervised/
│   ├── 00.concepts.md                       # Key ideas: clustering, dimensionality reduction, etc.
│   ├── 01.k_means.md
│   ├── 02.dbscan.md
│   ├── 03.hierarchical_clustering.md
│   ├── 04.pca.md
│   ├── 05.tsne.md
│   ├── algorithms/
│   └── notebooks/
│
├── reinforcement_learning/
│   ├── 00.concepts.md                       # Basics of agents, environments, rewards, etc.
│   ├── 01.q_learning.md
│   ├── 02.sarsa.md
│   ├── 03.deep_q_network.md
│   ├── 04.policy_gradient.md
│   ├── algorithms/
│   └── notebooks/
│
├── semi_supervised_learning/
│   ├── 00.concepts.md                       # Hybrid between supervised and unsupervised
│   ├── 01.self_training.md
│   ├── 02.label_propagation.md
│   ├── algorithms/
│   └── notebooks/
│
└── shared_resources/
    ├── datasets/                         # Sample datasets used across topics
    ├── utils/                            # Reusable utility functions
    └── references.md                     # Useful academic references and links

What is Machine Learning?

Machine Learning (ML) is a subset of Artificial Intelligence that allows systems to learn from experience (data) and improve their performance on a task without being explicitly programmed with rules. Instead of following hardcoded instructions, the system identifies patterns in data and uses those patterns to make predictions or decisions.

Analogy:

Think of a baby learning to recognize animals. At first, the baby is shown pictures of cats and dogs. Over time, the baby begins to notice patterns — cats have pointy ears, dogs often have longer snouts. Eventually, the baby can identify a new picture as a "dog" or "cat" based on what they’ve seen before — even without being told the rules. Machine Learning works in a similar way: it learns from examples instead of being told exactly what to do.

Example:

A machine learning model learns to recommend movies based on a user's viewing history and preferences — just like how a friend might suggest a movie based on what you’ve enjoyed before.

Types of Machine Learning

Supervised Learning

This is by far the most widely used type of ML in real-world applications.

What it is: You train a model on labeled data (i.e., the input and expected output are both known).
Use Cases:
- Email spam detection
- Credit scoring
- Medical diagnosis
- House price prediction

✅ Popular Algorithms

Linear Regression

Concept: Predicts a continuous value (e.g., student test score) based on one or more input features.
Essential Math:

$y = w_1x_1 + w_2x_2 + \cdots + w_nx_n + b$
It minimizes the Mean Squared Error (MSE) between predicted and actual values.
Use Case: Predicting prices, trends, or scores.

Logistic Regression

Concept: Used for binary classification (e.g., pass/fail, spam/ham).
Essential Math:

$P(y = 1 \mid x) = \sigma(w_1x_1 + w_2x_2 + \cdots + w_nx_n + b)$

Where the sigmoid function is:

$\sigma(z) = \frac{1}{1 + e^{-z}}$
Use Case: Disease prediction, marketing response, fraud detection.

Decision Trees

Concept: A flowchart-like structure where each internal node splits the data based on a feature.
Essential Math:
- Gini Impurity:
  
  $G = 1 - \sum_{i=1}^{C} p_i^2$
- Entropy (for Information Gain):
  
  $H = - \sum_{i=1}^{C} p_i \log_2(p_i)$
Use Case: Customer segmentation, credit risk modeling.

Random Forest

Concept: An ensemble of decision trees trained on random subsets of data and features.
Essential Math:
- For Regression:
  
  ŷ = (1 / T) × (y₁ + y₂ + ... + yₜ)
For Classification:

ŷ = majority vote of (y₁, y₂, ..., yₜ)
Use Case: Robust classification and regression tasks, e.g., loan approval, stock prediction.

Support Vector Machines (SVM)

Concept:
- Finds the hyperplane that best separates the data into classes.
Essential Math:
- Decision boundary:
  $w \cdot x + b = 0$
- Optimization constraint:
  $y_i(w \cdot x_i + b) \geq 1$
- Margin to maximize:
  $\frac{2}{\lVert w \rVert}$
Can use the kernel trick (e.g., RBF kernel) to handle non-linear decision boundaries.
Use Case: Text classification, face recognition, bioinformatics.

k-Nearest Neighbors (kNN)

Concept: Classifies a sample based on the majority vote (classification) or average (regression) of its k closest neighbors.
Essential Math:
- Euclidean Distance:
  $d(x, x') = \sqrt{ \sum_{i=1}^{n} (x_i - x'_i)^2 }$
Other distance metrics can be used, such as Manhattan, Cosine, or Minkowski, depending on the data.
Use Case: Recommender systems, image classification, anomaly detection.

Unsupervised Learning

What it is: The model tries to find patterns and groupings in the data without labeled outputs.
Use Cases:
- Customer segmentation
- Market basket analysis
- Anomaly detection
Popular Algorithms:
- k-Means Clustering
- DBSCAN
- PCA (Principal Component Analysis)
Python Libraries: scikit-learn, scipy, matplotlib

Reinforcement Learning

What it is: An agent learns to make decisions by interacting with an environment and getting feedback (rewards or penalties).
Use Cases:
- Robotics
- Game playing (e.g., AlphaGo)
- Self-driving cars
Popular Libraries: OpenAI Gym, Stable-Baselines, TensorFlow, PyTorch

Semi-Supervised Learning

What it is: Combines a small amount of labeled data with a large amount of unlabeled data to improve learning when labeling is expensive.
Use Cases:
- Web page classification
- Medical imaging
- Speech recognition
- Fraud detection
Popular Algorithms:
- Self-training
- Label propagation
- Semi-supervised Support Vector Machines (S3VM)
- Graph-based methods
Python Libraries: scikit-learn, sklearn.semi_supervised, TensorFlow, PyTorch

Machine Learning Techniques

Classification
A supervised learning task where the model learns to categorize data into predefined classes or labels.
Example: Predicting if an email is spam or not spam.
Regression
A supervised learning task where the goal is to predict a continuous value.
Example: Predicting the price of a house based on size, location, etc.
Clustering
An unsupervised learning method where the algorithm groups data into clusters based on similarity—without predefined labels.
Example: Segmenting customers into groups based on their behavior or purchases.
Anomaly Detection
Identifying data points that are unusual or deviate significantly from the majority.
Example: Detecting fraudulent credit card transactions.
Sequence Mining
Analyzing and identifying patterns in ordered data (sequences), especially over time.
Example: Finding common sequences in customer purchases or website navigation.
Dimension Reduction
Reducing the number of features (dimensions) in a dataset while keeping important information—used to simplify models and visualize high-dimensional data.
Example: Using PCA (Principal Component Analysis) to reduce image data with thousands of pixels into just a few features.
Recommendation System
A system that suggests items (movies, products, etc.) to users based on their preferences or behaviors.
Example: Netflix recommending movies or shows based on your watch history.

Machine Learning Model Lifecycle

Problem Definition
Clearly define the objective of the machine learning task.
Example: Predict customer churn or classify product reviews as positive or negative.
Data Collection
Gather relevant and sufficient raw data from various sources like databases, APIs, sensors, or manual input.
Example: Collecting user behavior logs or survey results.
Data Preparation
Clean, transform, and structure the data for training. This includes handling missing values, encoding categories, and normalizing values.
Example: Converting text into numeric form or removing outliers.
Model Development and Evaluation
Choose a model type, train it using prepared data, and evaluate its accuracy, precision, recall, or other relevant metrics.
Example: Training a decision tree and evaluating it using cross-validation.
Model Deployment
Integrate the trained model into a production environment where it can receive real input and make predictions.
Example: Deploying a fraud detection model via an API to monitor real-time transactions.

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
algorithms		algorithms
supervised-learning		supervised-learning
README.md		README.md
neural_network_of_a_neuron.py		neural_network_of_a_neuron.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Repository Purpose

Repository Structure

What is Machine Learning?

Analogy:

Example:

Types of Machine Learning

Supervised Learning

✅ Popular Algorithms

$y = w_1x_1 + w_2x_2 + \cdots + w_nx_n + b$

$P(y = 1 \mid x) = \sigma(w_1x_1 + w_2x_2 + \cdots + w_nx_n + b)$

$\sigma(z) = \frac{1}{1 + e^{-z}}$

$G = 1 - \sum_{i=1}^{C} p_i^2$

$H = - \sum_{i=1}^{C} p_i \log_2(p_i)$

ŷ = (1 / T) × (y₁ + y₂ + ... + yₜ)

ŷ = majority vote of (y₁, y₂, ..., yₜ)

$w \cdot x + b = 0$

$y_i(w \cdot x_i + b) \geq 1$

$\frac{2}{\lVert w \rVert}$

$d(x, x') = \sqrt{ \sum_{i=1}^{n} (x_i - x'_i)^2 }$

Unsupervised Learning

Reinforcement Learning

Semi-Supervised Learning

Machine Learning Techniques

Machine Learning Model Lifecycle

About

Uh oh!

Releases

Packages

Uh oh!

Languages

gil-son/machine-learning

Folders and files

Latest commit

History

Repository files navigation

Repository Purpose

Repository Structure

What is Machine Learning?

Analogy:

Example:

Types of Machine Learning

Supervised Learning

✅ Popular Algorithms

$y = w_1x_1 + w_2x_2 + \cdots + w_nx_n + b$

$P(y = 1 \mid x) = \sigma(w_1x_1 + w_2x_2 + \cdots + w_nx_n + b)$

$\sigma(z) = \frac{1}{1 + e^{-z}}$

$G = 1 - \sum_{i=1}^{C} p_i^2$

$H = - \sum_{i=1}^{C} p_i \log_2(p_i)$

ŷ = (1 / T) × (y₁ + y₂ + ... + yₜ)

ŷ = majority vote of (y₁, y₂, ..., yₜ)

$w \cdot x + b = 0$

$y_i(w \cdot x_i + b) \geq 1$

$\frac{2}{\lVert w \rVert}$

$d(x, x') = \sqrt{ \sum_{i=1}^{n} (x_i - x'_i)^2 }$

Unsupervised Learning

Reinforcement Learning

Semi-Supervised Learning

Machine Learning Techniques

Machine Learning Model Lifecycle

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages