Skip to content

mavleo96/yin-yang-classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Yin-Yang Classification

A fun project exploring different machine learning models' ability to classify data in a Yin-Yang shape pattern. This project implements and compares various classification models to understand their performance on this geometrically interesting dataset.

Dataset Overview

The Yin-Yang dataset presents a visually complex, non-linear classification challenge with intertwined class regions. Its intricate structure challenges models to learn boundaries that are non-linear, nested, and curved—making it an excellent benchmark for comparing model expressiveness.

Yin-Yang Data

I also came across an interesting paper titled "The Yin-Yang Dataset", which introduces a compact and balanced dataset designed to support research in biologically plausible error backpropagation and deep learning within spiking neural networks.


Classification Models and Results Visualization

Random Forest Classifier

it ain't much but it's honest work

  • Configuration: 50 trees, varying max_depth from 1 to 9
  • Performance: Captures most of the points in major classes at lower depths but is able to learn the complete decision boundaries only at depth 9

Random Forest


XGBoost

This bad boy can fit so many f**king classes in it

  • Configuration: 50 estimators, varying max_depth from 1 to 3
  • Performance: Shows solid performance even at low depths due to gradient boosting’s ability to combine weak learners.

XGBoost


Multi-Layer Perceptron (MLP)

MLP with Single Hidden Layer

  • Hidden Units: 3 to 18
  • Performance: Starts with baseline performance at low hidden units but improves with more neurons. Still, single-layer MLPs struggle to perfectly model the Yin-Yang’s nested, twisting structure.

MLP1

MLP with Two Hidden Layers

  • Hidden Layers: (2,2) to (12,12)
  • Performance: Learns the boundaries with lower number of units per layer than MLP with single hidden layer. The second hidden layer allows the network to approximate more complex decision boundaries.

MLP2


Support Vector Machine (SVM)

  • Configuration: SVM with rbf, linear, poly and sigmoid kernel and vary inverse regularization parameter between 0.1, 1 and 10.
  • Performance:
    • RBF kernel captures the curved boundaries best.
    • Linear and poly kernels underperform due to their limited flexibility.
    • Sigmoid kernel gives unstable results in this context.

      Ew ... brother ew...what's that brother

    • Very high training time for kernels like RBF.

SVM


K-Nearest Neighbors (KNN)

  • Neighbors: 1 to 3
  • Performance: Despite being simple, KNN performs surprisingly well on this dataset due to its instance-based nature. It handles the swirls of the Yin-Yang reasonably well.

KNN


Gaussian Naive Bayes

  • Performance: Assumption that x and y contribute independently to the probability of a class breaks down in this dataset.

Naive Bayes


Clustering Models and Results Visualization

Clustering algorithms are not suited for this task / dataset but we still try to visualize their behaviour below. Assignment of clusters to labels is done by Hungarian algorithm.

K-Means

  • Configuration: Varying number of clusters from 3 to 7 with k-means++ initialization.
  • Performance: Clustering algorithm not suitable for a highly non-linear classification problem.

K-Means


DBCSAN

Why am I even here?

  • Epsilon: 0.1 to 0.3
  • Performance: Density of points is similar throughtout dataset and hence this algorithm is highly unsuitable for this dataset.

DBSCAN


Installation and Usage

See INSTALL.md for detailed installation and usage instructions.

About

Visualization of yin-yang data classification

Resources

License

Stars

Watchers

Forks

Contributors

Languages