Skip to content

AyanQuadri/ML-Lab

Repository files navigation

Machine Learning Classification and Clustering Project

A comprehensive collection of machine learning algorithms implemented using scikit-learn, covering both supervised and unsupervised learning techniques.

Table of Contents

Overview

This Repo demonstrates various machine learning algorithms for classification, regression, and clustering tasks. Each algorithm is implemented with detailed examples, proper error handling, and comprehensive output analysis.

Project Structure

ML-Lab/
├── supervised_learning/
│   ├── classification/
│   │   ├── svm_comparison.py          # SVM vs Random Forest comparison
│   │   ├── logistic_regression.py     # Logistic regression on Iris dataset
│   │   └── classification_metrics.py  # Comprehensive metrics evaluation
│   └── regression/
│       └── linear_regression.py       # Linear regression for classification
├── unsupervised_learning/
│   └── clustering/
│       ├── k_means.py                 # K-means clustering analysis
│       └── agglomerativ.py           # Agglomerative clustering methods
├── pyproject.toml                     # Project dependencies
├── README.md                          # This file
└── uv.lock                           # Dependency lock file

Installation

This project uses uv for dependency management. Follow these steps to set up:

# Clone the repository
git clone https://github.com/AyanQuadri/ML-Lab.git
cd ML-Lab

# Create virtual environment
uv venv

# Activate virtual environment (optional, uv run handles this)
source .venv/bin/activate  # Linux/Mac
# or
.venv\Scripts\activate     # Windows

# Install dependencies
uv sync

Algorithms Implemented

Supervised Learning

Classification Algorithms

  • Purpose: Compare Support Vector Machine and Random Forest performance
  • Dataset: Iris dataset
  • Features:
    • Linear SVM implementation
    • Random Forest with 100 estimators
    • Accuracy comparison
  • Run: uv run python supervised_learning/classification/svm_comparison.py
  • Purpose: Multi-class classification using logistic regression
  • Dataset: Iris dataset (3 classes)
  • Features:
    • Complete classification report
    • Dataset information display
    • Convergence optimization
  • Run: uv run python supervised_learning/classification/logistic_regression.py
  • Purpose: Comprehensive evaluation of classification algorithms
  • Algorithms: Logistic Regression, Decision Tree
  • Metrics Calculated:
    • Accuracy, Precision, Recall, F1-Score
    • Confusion Matrix
    • Per-class TP, FP, TN, FN values
  • Run: uv run python supervised_learning/classification/classification_metrics.py

Regression Algorithms

  • Purpose: Demonstrate linear regression for binary classification
  • Features:
    • Simple binary classification example
    • Model parameter extraction
    • Threshold-based classification
  • Run: uv run python supervised_learning/regression/linear_regression.py

Unsupervised Learning

Clustering Algorithms

  • Purpose: Cluster analysis using K-means algorithm
  • Dataset: Iris dataset
  • Features:
    • 3-cluster analysis
    • Inertia calculation
    • Species distribution per cluster
    • Clustering accuracy assessment
  • Run: uv run python unsupervised_learning/clustering/k_means.py
  • Purpose: Hierarchical clustering analysis
  • Datasets: Synthetic blob data + Iris dataset
  • Features:
    • Multiple linkage methods (ward, complete, average, single)
    • Synthetic and real data comparison
    • Adjusted Rand Index calculation
  • Run: uv run python unsupervised_learning/clustering/agglomerativ.py

Usage

Run Individual Algorithms

# Classification algorithms
uv run python supervised_learning/classification/svm_comparison.py
uv run python supervised_learning/classification/logistic_regression.py
uv run python supervised_learning/classification/classification_metrics.py

# Regression algorithms
uv run python supervised_learning/regression/linear_regression.py

# Clustering algorithms
uv run python unsupervised_learning/clustering/k_means.py
uv run python unsupervised_learning/clustering/agglomerativ.py

Expected Outputs

Classification Algorithms

  • SVM vs Random Forest: Accuracy scores comparison (typically both achieve 1.0 on Iris)
  • Logistic Regression: Detailed classification report with precision/recall for each species
  • Classification Metrics: Comprehensive confusion matrix and per-class metrics

Regression Algorithms

  • Linear Regression: Predicted probability, class assignment, and model parameters

Clustering Algorithms

  • K-Means: Cluster centers, species distribution, clustering accuracy (~89%)
  • Agglomerative: Multiple linkage method results, cluster purity analysis

Note: All algorithms use the Iris dataset for consistency and comparison purposes, except where synthetic data provides better demonstration of specific algorithmic properties.