Skip to content

Latest commit

 

History

History
24 lines (17 loc) · 1.32 KB

README.md

File metadata and controls

24 lines (17 loc) · 1.32 KB

KMeans-Predictive-Modeling

K-Means Clustering is a Machine Learning Algorithm which creates a model based on the nearest-neighbour-choice.

The model in this repository has been implemented by k-means classification which predics the favourite sport in a state(in India).

Explanation

util_methods.py - Contains the linear algebra methods that are used in the k-means model.

fav_sport.py - Run this file to get output of the model. It contains the following methods:

  • major_cluster(labels) : Takes a list of input variables as parameter and returns a winner with most number of occurences.
  • knn_classify(k, labeled_points, new_point) : Driver algorithm which creates cluster based on the value of k.
  • plot_cities() : Plots the cities based on (latitude and longitude) with the appropriate markers of favourite sport.
  • classify_and_plot_grid() : Plots the graph of k-means clustering model!

Use

An attempt to make the code as readable as possible has been made.
You can modify the code to create your own k-means classification.
e.g. Instead of states(a list used in the fav_sport.py file), you can use languages which people speak throughout the world.
Feel free to reach out for any suggestions or help!

Inspiration

Joel Grus and his book: Data Science from Scratch