Skip to content

Koustav2908/hyperparameter-tuning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

2 Commits
ย 
ย 
ย 
ย 

Repository files navigation

ใ€ฝ๏ธ Model Evaluation and Hyperparameter Tuning

This notebook covers the initialization of four common classification models, training on the iris dataset, evaluation of the models, and hyperparameter tuning to optimise the parameters of 3 models.

๐Ÿ“š Dataset - Iris

The iris dataset is a classification dataset which is small and perfect for beginer friendly classification tasks. It contains 150 samples of iris flowers, belonging to three species:

  • Setosa
  • Versicolor
  • Virginica

Each sample has 4 features (all numerical and continuous):

Feature Description
sepal length (cm) Length of sepal
sepal width (cm) Width of sepal
petal length (cm) Length of petal
petal width (cm) Width of petal

Each flower (row) is labeled with a target value: 0 = setosa, 1 = versicolor, 2 = virginica

๐Ÿ›๏ธ Classification Models

  1. Random Forest: A random forest classifier is a machine learning algorithm that builds multiple decision trees during training and combines their predictions to classify data. It's an ensemble method, meaning it leverages the collective intelligence of multiple models (the decision trees) to make more accurate and reliable predictions than a single decision tree could achieve.

  2. Support Vector Machine (SVM): A Support Vector Machine (SVM) is a supervised machine learning algorithm used for classification and regression. It works by finding the optimal hyperplane that separates data points into different classes, maximizing the margin between them.

  3. K-Nearest Neighbors (KNN): The k-nearest neighbors (KNN) algorithm is a non-parametric, supervised learning classifier, which uses proximity to make classifications or predictions about the grouping of an individual data point. It is one of the popular and simplest classification and regression classifiers used in machine learning today.

  4. Naive Bayes: The Naรฏve Bayes classifier calculates the probability of a given instance belonging to a particular class based on the probabilities of its features. It assumes that the presence or absence of each feature is independent of the presence or absence of other features, which simplifies the calculations.

๐Ÿ“Š Model Evaluation

We saw in the notebook that all the models gave good results without hyperparameter tuning with the default parameters provided by scikit-learn. Since, the iris dataset is small (150 records), all four models reached an accuracy of 97% on the test dataset which had 30 samples.

โœ… Hyperparameter Tuning

  • Hyperparameter tuning is the process of finding the optimal set of hyperparameters for a machine learning model, which are parameters that are set before the learning process begins and influence how the model learns.

  • To tune the Random Forest, SVM, and KNN models, we used GridSearchCV, and RandomizedSearchCV to select the optimal hyperparameters for this dataset.

  • We saw that for Random Forest, the best hyperparameters were:

RandomForestClassifier(n_estimators=150, random_state=42)
  • For SVM, the optimal hyperparameters are:
SVC(C=5.908361216819946, gamma='auto', kernel='linear', probability=True, random_state=42)
  • And, for KNN, the hyperparameters are:
KNeighborsClassifier(metric='euclidean', n_neighbors=9, weights='distance')

๐Ÿ”ฎ Future Work

We can furthur see the difference of using Hyperparameter tuning in a large, noisy dataset. Since the iris dataset is small and not noisy, the default parameters and the optimized parameters yeilded the same results.

About

Using hyperparameter tuning to optimize the parameters of some classification machine learning models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published