Naive Bayes Classifier from Scratch

This project implements a Naive Bayes Classifier from scratch using Python. The classifier is used to classify mushrooms as either edible or poisonous based on various features such as cap shape, cap color, gill color, etc.

Getting Started

To use this classifier, you will need Python 3 installed on your computer. You will also need the following libraries:

numpy
pandas

To install these libraries, run the following command:

pip install numpy pandas

Usage

To use this classifier, you first need to have a dataset in the form of a CSV file with columns representing the features and a column for the class (edible or poisonous). This project includes a sample dataset (Mushroom_Train.csv and Mushroom_Test.csv) that can be used for testing.

Load the dataset using pandas:

import pandas as pd

df = pd.read_csv('./Mushroom_Train.csv')

Preprocess the dataset by encoding the categorical features:

from naive_bayes import encodeCol

obj_df = df.select_dtypes(include=['object']).copy()
obj_df["stalk-root"].replace({"?": "b"}, inplace=True)

encoded_all = {}
for col in obj_df.columns:
    encoded_all[col] = encodeCol(obj_df[col])
    
df_train = obj_df.replace(encoded_all)

Split the dataset into features and class:

X_train = df_train.drop('class', axis=1)
y_train = df_train['class']

Train the classifier:

from naive_bayes import GNaiveBayesClassifier

model = GNaiveBayesClassifier()
model.train(X_train, y_train)

Load the test dataset and preprocess it the same way as the training dataset:

df_test = pd.read_csv('./Mushroom_Test.csv')
df_test["stalk-root"].replace({"?": "b"}, inplace=True)

encoded_all = {}
for col in df_test.columns:
    encoded_all[col] = encodeCol(df_test[col])
    
df_test = df_test.replace(encoded_all)

X_test = df_test.drop('class', axis=1)
y_test = df_test['class']

Make predictions on the test dataset:

predicted = model.predict(X_test)

Evaluate the performance of the classifier using accuracy and confusion matrix:

accuracy = model.accuracy(y_test, predicted)
confusion_matrix = model.confusionMatrix(predicted, y_test)

Naive Bayes Classifier

Naive Bayes is a classification algorithm based on Bayes' theorem. It assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature. This is called the "naive" assumption, hence the name Naive Bayes.

Naive Bayes calculates the probability of each class given the input features and selects the class with the highest probability as the output.

The algorithm consists of two steps: training and prediction. During training, the model learns the probability distribution of each feature given each class. During prediction, the model calculates the probability of each class given the input features using Bayes' theorem and selects the class with the highest probability as the output.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Mushroom_Test.csv		Mushroom_Test.csv
Mushroom_Train.csv		Mushroom_Train.csv
NaiveBayesClassifier.ipynb		NaiveBayesClassifier.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Naive Bayes Classifier from Scratch

Getting Started

Usage

Naive Bayes Classifier

About

Releases

Packages

Languages

shiivashaakeri/Naive-Bayes-Classifier-From-Scratch

Folders and files

Latest commit

History

Repository files navigation

Naive Bayes Classifier from Scratch

Getting Started

Usage

Naive Bayes Classifier

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages