Skip to content

The implementation of Kth Nearest Neighbor Classifier to classify varieties of wheat seeds

Notifications You must be signed in to change notification settings

AAWilcox/CSE-5160-KNN-Classifier-Implementation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 

Repository files navigation

KNN Classifier Implementation

CSE 5160 - Machine Learning
By Alyssa Wilcox


About

The implementation of a Kth Nearest Neighbor (KNN) machine learning classifier. Using test data, this project classifies three varieties of wheat seeds: Kama, Rosa, and Canadian.

Course Information

  • Student: Alyssa Wilcox
  • Instructor: Dr. Yan Zhang
  • Course: CSE 5160 Machine Learning
  • Session: CSUSB Fall 2020

Training Set

This KNN clasifier heavily relies on training data to build the classifier.

Training Set Used:

  • The training set used corresponds to 180 training instances of measurements and classification of wheat seeds.
  • This data is provided by UCI's Machine Learning Respository.

Training Instance Features:

  • Each training instance has seven features. Each measurement corresponds to some measurement taken from three varieties of wheat seeds.
  • The seven features include:
    • Area, A
    • Perimeter, P
    • Compactness, C
    • Length of Kernel, L
    • Width of Kernel, W
    • Asymmetry Coefficient, AC
    • Length of Kernel Groove, LG

Training Instance Classifications:

  • Each training instance is classified as one of three varieties of wheat seeds:
    1. Kama wheat seed, classified as 1
    2. Rosa wheat seed, classified as 2
    3. Canadian wheat seed, classified as 3

How the Training Set Built the Classifier:

  • The classifier works by calculating the Euclidean distances between the test instance and every training instance.
  • The classifications of the KNN training instances are used to classify a test instance.
  • The training set does not explicitly build a model for the classifier.

Applying the Classifier on Test Instances

Test instances were used to determine the accuracy of the KNN classifier.

Test Set Data:

  • The original data set consists of 210 training instances. Thirty of them are used as a test set, while the remaining 180 became the training set.
  • The thirty test instances were chosen at random, with ten corresponding to Kama wheat seeds, ten corresponding to Rosa wheat seeds, and ten corresponding to Canadian wheat seeds.

Test Results - Test Error:

  • To find test error, the KNN classifier was tested using a thirty instance long test set.
  • Four different values of K were used: 5, 10, 15, and 20 nearest neighbors.
  • Results:
    • K value 5: Accuracy: 90%, Test Error: 10%
    • K value 10: Accuracy: 87%, Test Error: 13%
    • K value 15: Accuracy: 90%, Test Error: 10%
    • K value 20: Accuracy: 87%, Test Error: 13%

Test Results - Training Error:

  • To find training error, the KNN classifier was tested using 10 randomly selected training instances.
  • Four different values of K were used: 5, 10, 15, and 20 nearest neighbors.
  • Results:
    • K value 5: Accuracy: 100%, Training Error: 0%
    • K value 10: Accuracy: 100%, Training Error: 0%
    • K value 15: Accuracy: 100%, Training Error: 0%
    • K value 20: Accuracy: 100%, Training Error: 0%

Project status

Completed

Credits

About

The implementation of Kth Nearest Neighbor Classifier to classify varieties of wheat seeds

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages