About The Project

Table of Contents

About The Project
- Built With
Getting Started
- Prerequisites
- Installation
API Documentation
Usage
Contact

About The Project

This is a part of my Introduction to Data Science's assignment at university. In this part, I tried to write my own implementation of Naive Bayes Classifier from scratch!

Built With

(back to top)

Getting Started

Prerequisites

To use this module, your system needs to have:

numpy
```
pip install numpy
```

Installation

You can install this module by cloning this repository into your current working directory:

git clone https://github.com/theEmperorofDaiViet/naive_bayes.git

(back to top)

API Documentation

The Naive_Bayes module implements Naive Bayes algorithms. These are supervised learning methods based on applying Bayes’ theorem with strong (naive) feature independence assumptions.

Naive_Bayes.Gaussian_Naive_Bayes

This model is mainly used when dealing with continuous data.

fit(X, y)[source]

Fit Gaussian Naive Bayes according to X, y.

Parameters

X: np.array of shape (n_samples, n_features)

Training vectors, where n_samples is the number of samples and n_features is the number of features.

y: np.array of shape (n_samples)

Target values.

Returns

None

gaussian_density(x, mean, var)[source]

Calculate the probabiliti(es) density function of Gaussian distribution for a give sample, knowing the mean(s) and the variance(s).

Parameters

x: float or np.array(dtype = float) of shape (n_features)

Value(s) of a feature or each feature of a certain sample.

mean: float or np.array(dtype = float) of shape (n_features)

Mean(s) of a feature or each feature.

var: float or np.array(dtype = float) of shape (n_features)

Variance(s) of a feature or each feature.

Returns

C: float or np.array(dtype = float) of shape (n_features)

Returns the probabiliti(es) of a feature or each feature of the sample.

class_probability(x)[source]

Calculate the probabilities of a given sample to belong to each class, then choose the class with maximum probability.

Parameters	x: np.array(dtype = float) of shape (n_features) A certain sample.
Returns	C: str or *int* Returns the class which have the maximum probability of the input sample belong to it.

predict(X)[source]

Perform classification on an array of test vectors X.

Parameters	X: np.array of shape (n_samples, n_features) The input samples.
Returns	C: np.array of shape (n_samples) Predicted target values for X.

(back to top)

Usage

Here is an example of how this module can be used to perform data classification.

In this example, I use the dry bean dataset from Kaggle.

Import libraries, modules and load data

>>> from Naive_Bayes import Gaussian_Naive_Bayes
>>> import correctness
>>> import pandas as pd
>>> import numpy as np
>>> from sklearn.model_selection import train_test_split

>>> df = pd.read_excel('Dry_Bean_Dataset.xlsx')
>>> df.shape
(13611, 17)

The correctness module I import is my other built-from-scratch module. It's used for evaluating the performance of classification models. You'll see it's effect below, or you can take a look at it here.

Preprocess and split data

>>> data = df.drop(['ConvexArea','EquivDiameter','AspectRation','Eccentricity','Class','Area','Perimeter','ShapeFactor2','ShapeFactor3','ShapeFactor1','ShapeFactor4'],axis = 1)
>>> target = df['Class']

>>> X = np.array(data)
>>> y = np.array(target)

>>> X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

Perform classification using this module and evaluate the model performance

>>> nb = Gaussian_Naive_Bayes()
>>> nb.fit(X_train, y_train)
>>> y_pred = nb.predict(X_test)

>>> cm = correctness.confusion_matrix(y_test, y_pred)
>>> scratch = correctness.accuracy(cm)
>>> print(correctness.report(cm))
CLASSIFICATION REPORT:
   precision    recall  f1-score  support
0   0.814394  0.911017  0.860000      264
1   1.000000  1.000000  1.000000      106
2   0.920245  0.874636  0.896861      326
3   0.879501  0.940741  0.909091      722
4   0.964770  0.922280  0.943046      369
5   0.957286  0.929268  0.943069      398
6   0.866171  0.821869  0.843439      538
          precision    recall  f1-score  support
                                                
macro      0.914624  0.914259  0.913644     2723
micro      0.903048  0.903048  0.903048     2723
weighted   0.904705  0.903048  0.903094     2723
accuracy    0.903048

Perform classification but using sklearn.naive_bayes.GaussianNB and evaluate the model performance

>>> from sklearn.naive_bayes import GaussianNB

>>> sknb = GaussianNB()
>>> sknb.fit(X_train, y_train)
>>> y_sk = sknb.predict(X_test)

>>> skcm = correctness.confusion_matrix(y_test, y_sk)
>>> sklearn = correctness.accuracy(skcm)
>>> print(correctness.report(skcm))
CLASSIFICATION REPORT:
   precision    recall  f1-score  support
0   0.814394  0.907173  0.858283      264
1   1.000000  1.000000  1.000000      106
2   0.920245  0.879765  0.899550      326
3   0.876731  0.939169  0.906877      722
4   0.964770  0.924675  0.944297      369
5   0.957286  0.927007  0.941904      398
6   0.862454  0.815466  0.838302      538
          precision    recall  f1-score  support
                                                
macro      0.913697  0.913322  0.912745     2723
micro      0.901579  0.901579  0.901579     2723
weighted   0.903176  0.901579  0.901603     2723
accuracy    0.901579

Compare the accuracy of two models:

>>> Naive_Bayes_report = pd.DataFrame([[sklearn, scratch]])
>>> Naive_Bayes_report.columns = ['sklearn NB', 'scratch NB']
>>> Naive_Bayes_report
  sklearn NB	scratch NB
  0.901579	  0.903048

As you can see, the accuracy of two models using my "scratch" Gaussian_Naive_Bayes and using the sklearn's GaussianNB are approximately the same. And with little luck, my module's accuracy is slightly higher.

(back to top)

Contact

You can contact me via:

: Khiet.To.05012001@gmail.com

(back to top)

Style Sheets

Github's markdown processor cannot render <style> sheets, so you may see it lying here:

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
Naive_Bayes.py		Naive_Bayes.py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About The Project

Built With

Getting Started

Prerequisites

Installation

API Documentation

Naive_Bayes.Gaussian_Naive_Bayes

Usage

Import libraries, modules and load data

Preprocess and split data

Perform classification using this module and evaluate the model performance

Perform classification but using sklearn.naive_bayes.GaussianNB and evaluate the model performance

Compare the accuracy of two models:

Contact

Style Sheets

About

Releases 1

Languages

theEmperorofDaiViet/naive_bayes

Folders and files

Latest commit

History

Repository files navigation

About The Project

Built With

Getting Started

Prerequisites

Installation

API Documentation

Naive_Bayes.Gaussian_Naive_Bayes

Usage

Import libraries, modules and load data

Preprocess and split data

Perform classification using this module and evaluate the model performance

Perform classification but using sklearn.naive_bayes.GaussianNB and evaluate the model performance

Compare the accuracy of two models:

Contact

Style Sheets

About

Topics

Resources

Stars

Watchers

Forks

Releases 1

Languages