As part of an academic course, I implemented the K-means++ algorithm in C Python.
The K-means++ algorithm has many real-world uses. For example, in marketing, it helps companies group customers based on their buying habits for better targeting. In image processing, it compresses images by grouping similar pixels together, saving storage space without losing quality. Also, in social networks, it identifies communities by grouping users with similar interactions, useful for targeted content and network analysis.
- K - the number of required clusters
- inter - maximum iteration count
- eps - convergence value
Combine input files by inner join using the first column as a key and sort the data points in ascending order.
Import the C module import mykmeanssp
, call the fit()
method with initial centroids and data points, and retrieve the final centroids.
The implementation involves several mathematical concepts and algorithms, including: