Skip to content

Latest commit

 

History

History
29 lines (22 loc) · 1.61 KB

README.md

File metadata and controls

29 lines (22 loc) · 1.61 KB

Comparison of Naive Bayes Algorithm and Multinomial Logistic Regression on Dry Beans Dataset

This a solution notebook to an assignment question given in a Data Mining graduate course. Each code block is accompanied by relevant analysis wherever required.
Dataset link: https://archive.ics.uci.edu/ml/datasets/Dry+Bean+Dataset
Broadly, the following steps have been performed in this solution notebook:

  • Plotted the class distribution of the dataset and its analysis.
  • Performed EDA (histograms, box plots,etc.) and provided various insights on the data.
  • Used TSNE alogorithm to reduce data dimensions to 2 and plotted the resulting data as scatterplot.
    • This helps in observing the separability of the data.
  • Ran the sklearn implementation of Gaussian Naive Bayes and Multinomial Naive Bayes.
    • Reported Accuracy, Recall, and Precision and analyzed the differences in the two implementations of Naive Bayes using the [80:20] train test split
  • Used Principal Component Analysis (PCA) to reduce the number of features and used the reduced dataset for model training.
    • Retained dfifferent amounts of variance values, ranging from 0.9 to 1 in steps of 0.01.
    • Compared the results using Accuracy, Precision, Recall and F1-score.
  • Plotted ROC-AUC curves
  • Further trained the model using Multinomial Logistic Regression and compared the results with Naive Bayes.
These above assumptions and the flow of work is according to the questions asked in assignment.