Our study involves the assessment of a rudimentary Natural Language Processing algorithm aimed at rating product reviews. Each positive word increases the score of the review, whereas a negative word decreases its score. While this method may not be well-suited for accurately predicting a review's numerical rating on a scale of 1 to 5, it proves to be an effective approach for categorizing reviews into positive and negative classes with acceptable levels of accuracy.
code
contains the Jupyter notebooks to run the experimentdata/input
contains the external datasets used as input filesdata/ouput
contains files generated while running the experimentdocumentation
contains the architecture of the pipeline and pipeline metadata
- "Amazon Customer Review Data for sentiment analysis"
- Author: Akash Shashikant Vaykar, Abhishek Kaushik (ORCID)
- Publication: November 21, 2019
- License: Creative Commons Attribution 4.0 International
- Mobile App Stores such as Google, Apple have wide range of applications to suffice every need of customers in the digital platform. Customer feedback and ratings has always been one of the major metrics that can be used to review the performance and accordingly provide suitable recommendations to enhance the functionality. The Given dataset contain the feedback of the customer regarding the app used in app store.
- Author: Abhishek Kaushik (ORCID), Swathi Venkatakrishnan
- Publication: May 15, 2019
- License: Creative Commons Attribution 4.0 International
- The NRC Emotion Lexicon is a list of English words and their associations with eight basic emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) and two sentiments (negative and positive). The annotations were manually done by crowdsourcing.
- http://saifmohammad.com/WebPages/NRC-Emotion-Lexicon.htm
- Author: Saif M. Mohammad
- Publication: July 10, 2011
- License: No licence specified, however the dataset can be used freely for non-commercial research and educational purposes.
- You can either run this experiment on your host or in a Docker container. We recommend using Docker.
- The Amazon and Google Play Reviews are already included in this repository
- Due to licensing issues, we are not allowed to distribute the NRC dataset. Thus, it it necessary to manually download it:
- Download the NRC dataset (http://saifmohammad.com/WebPages/NRC-Emotion-Lexicon.htm), extract it and place the content of the extracted folder into
data/input/nrc
.
- Download the NRC dataset (http://saifmohammad.com/WebPages/NRC-Emotion-Lexicon.htm), extract it and place the content of the extracted folder into
- Make sure that Python 3.0 or higher is installed on your system. If it is not already, follow https://www.python.org/downloads/ to install it
- Run
pip install -r requirements.txt
in the project root directory - Run
jupyter notebook
- Open
http://localhost:8888
- The
code
directory contains the Jupyter notebooks - Run the notebooks in the following order:
01_merge_preprocess.ipynb
,02_score_reviews.ipynb
,03_ visualize.ipynb
- Make sure that Docker is installed on your device. If it is not already, follow https://docs.docker.com/get-docker/ to install it
- Run
docker build . -t simple-nlp
to build the docker container - Run
docker run -p 8888:8888 simple-nlp
- Open
localhost:8888
- The
code
directory contains the Jupyter notebooks - Run the notebooks in the following order:
01_merge_preprocess.ipynb
,02_score_reviews.ipynb
,03_ visualize.ipynb
This Jupyter notebook file is used to merge the Google Play Store review and Amazon review datasets. Furthermore, it filters out stopwords (e.g. this, ) in the dataset. This file produces the data/output/[ddmmyyy]_merged_preprocessed.csv
file.
This is used to calculate the predicted ranking of the review using the simple NLP algorithm. It produces the output file data/output/[ddmmyyy]_predicted_rating.csv
.
Visualizes the data. It creates confusion matrices for the predicted rating (data/output/rating_confusion.pdf
) and the predicted category (data/output/category_confusion.pdf
).
If you reuse the software, please cite it using the Zenodo DOI.
This project is MIT-licensed, as found in the LICENSE file.