Skip to content

Latest commit

 

History

History
38 lines (24 loc) · 1.19 KB

File metadata and controls

38 lines (24 loc) · 1.19 KB

NLP_Porject_Restaurant_review_sematic_classification- Project Overview

This is my learning project when taking Machine Learning A-Z course on Udemy. The goal is to do sematic analysis/ classification on the restaurant review texts via bag of words model.

1. Problem definition

To deterimine a given restaurant review text is negative or possitive.

2. Data

The data provide by the course material.

3. Code and Resource Used

Python version: 3.7 Packages: pandas, numpy, matplotlib.pyplot, re, nltk

4. EDA

Check how does the data look like:

image

5. Data Cleaning

Clean the data with the following steps:

  • replace anything that is not letters into space
  • make everything lower case
  • customized the stop word list: excluded the word "not"
  • stemming everything that is not in the stop word list

6. Modelling

Split the data into train and test set. The model that is used for current project: Naive Bayes

7. Evaluataion

confusion matrix:

image

accuracy: 0.67