This repository presents a tutorial to implement K-means clustering in R. It is intended to complement the first half of UConn's Social-Ecological-Environmental (SEE) Lab's introduction to machine learning and K-means clustering, led by Shu Jiang (graduate student, Department of Psychological Sciences, University of Connecticut).
I'll also be using this as an opportunity to share a few R programming tips along the way. It may be especially helpful for those who are not familiar with programming with the Tidyverse, a useful cluster of libraries in R.
With many, many thanks to Bradley Boehmke for the K-means Cluster Analysis tutorial on the University of Cincinnati's Business Analytics R Programming Guide, on which this tutorial is modeled.
For this tutorial, you will need:
- R
tidyverse
libraryggplot2
libraryviridis
librarycluster
libraryfactoextra
library
If you haven't already, I would strongly recommend installing RStudio, a useful IDE (integrated development environment) for R. It has lots of helpful capabilities that can make your programming experience a bit smoother.
If you're looking for more datasets to start exploring k-means clustering in more depth, check out the datasets available from the Open-Source Psychometrics Project.