Assignment 3: Implementation of Kmeans from Scratch

Assignment Overview:

In this assignment, you will be implementing the K-Means algorithm from scratch. K-Means is one of the fundamental unsupervised learning algorithms that partition data into K-distinct clusters based on distance metrics.

You will be working with the classic Iris dataset to train your implementation and an extended iris dataset to test your implementation.

Additionally, you will be responsible for finding out how many clusters there actually are (no googling the answer of course ]: )!

For this assignment, we will be:

Implementing KMeans clustering from scratch
Using the algorithm to cluster the classic Iris dataset
Creating visualizations to understand cluster performance
Using the elbow method to determine optimal cluster numbers

Iris Dataset Context

The original Iris dataset contains 4 recorded features of the iris flower:

Sepal length
Sepal width
Petal length
Petal width

Assignment Context

KMeans Clustering

KMeans clustering works by:

Randomly initializing K centroids
Assigning points to the nearest centroid
Updating centroid positions based on the mean of assigned points
Repeating steps n-iterations until convergence

The algorithm uses distance metrics (typically Euclidean) to measure the similarity between points and centroids.

Plotting Predictions

Once you've successfully created your KMeans algorithm, initialize your KMeans algorithm, fit and predict your model on the extended-iris dataset, choose a scoring method to use and plot it!

Note: Make sure you import the scoring method you chose.

Finding K

One method you may remember from class is the elbow technique which helps you determine the optimal number of clusters (K) for KMeans clustering. It works by:

Running KMeans with different values of K
Calculating the inertia for each K
Plotting K vs. inertia
Finding the elbow point where increasing K yields diminishing returns

Resources to help you get started

Creating KMeans:

Plotting your data and Finding K

Note: You do not need to modify anything in visualization.py.

As a reminder, try not to use ChatGPT to generate code, but have it suggest tools that may be helpful

Grading (70 points total):

KMeans Class Implementation (45 points)

Init method (5 points):
- Correctly initializes all parameters (5)
Fit method (25 points):
- Correct random centroid initialization (5)
- Correct implementation of distance (5)
- Correct cluster assignment (5)
- Correct centroid update mechanism (5)
- Correct convergence checking (5)
Predict method (5 points):
- Correct assignment of new points to clusters (5)
Helper Methods (10 points):
- get_error implementation (5)
- get_centroid implementation (5)

Analysis and Visualization (25 points)

Evaluation (1):
- Model predicts centroid on new dataset (1)
Visualization (20):
- Picked the proper scoring method to evaluate the KMeans model (5)
- Utilized plot_3d_cluster to view clusters (3)
- Generated Elbow Plot (12)
Analysis (4):
- Correct K prediction (1):
- Valid K prediction reasoning (3):

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
cluster		cluster
data		data
.DS_Store		.DS_Store
LICENSE		LICENSE
README.md		README.md
mac_requirements.txt		mac_requirements.txt
main.py		main.py
win_requirements.txt		win_requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Assignment 3: Implementation of Kmeans from Scratch

Assignment Overview:

Iris Dataset Context

Assignment Context

KMeans Clustering

Plotting Predictions

Finding K

Resources to help you get started

As a reminder, try not to use ChatGPT to generate code, but have it suggest tools that may be helpful

Grading (70 points total):

KMeans Class Implementation (45 points)

Analysis and Visualization (25 points)

About

Releases

Packages

Languages

License

Compdrug203/Assignment-3

Folders and files

Latest commit

History

Repository files navigation

Assignment 3: Implementation of Kmeans from Scratch

Assignment Overview:

Iris Dataset Context

Assignment Context

KMeans Clustering

Plotting Predictions

Finding K

Resources to help you get started

As a reminder, try not to use ChatGPT to generate code, but have it suggest tools that may be helpful

Grading (70 points total):

KMeans Class Implementation (45 points)

Analysis and Visualization (25 points)

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages