Skip to content

Latest commit

 

History

History
36 lines (21 loc) · 2.75 KB

README.md

File metadata and controls

36 lines (21 loc) · 2.75 KB

TripAdvisor Review Analyzer

Tripadvisor Review Analyzer App using Python and selenium to scrape and extract the latest reviews from an attraction on the Tripadvisor URL link the user enters on the landing page, scraped review data then cleaned, processed and analyzed with Natural Language Processing toolkit NLTK and Sentiment Analysis is performed on the contents of the reviews

About the Project

About the Project

Tripadvisor Review Analyzer App for tourist attractions using Python and selenium to scrape and extract the latest reviews from an attraction on the Tripadvisor URL link the user enters on the app landing page, then the scraped review data are cleaned,processed and analyzed with Natural Language Processing toolkit NLTK and Sentiment Analysis is performed on the contents of the reviews.

First of all, when the URL link of an attraction on Tripadvisor is entered by the user,selenium will scrape the data for the latest 100 reviews written for the attraction on Tripadvisor page *(less than 100 reviews will be analyzed if the attraction is fairly new or unknown and has less than 100 reviews written on its Tripadvisor page) then, using the Natural Language Toolkit python package NLTK and its built-in Vader Sentiment Analyzer, classify the reviews written for the attraction as positive, negative or neutral using a lexicon of positive and negative words.

Once the reviews are classified, data processing is performed on positive and negative reviews data respectively, Tokenization to break down the review sentences into meaningful elements as tokens, lowercase texts and remove puctuations then remove the words such as "the", "is", "what" and so on from the tokenized data that are irrelevant to text sentiment and dont provide any valuable information which are stopwords

The next step is, again with NLTK, get the most common words found in both positive and negative review groups and the following data is available and displayed as analyzed results on the results page:

number of reviews classified as positive
number of reviews classified as negative
few samples of reviews classified as positive
few samples of reviews classified as negative
Most frequently used words and its frequency found in POSITIVE reviews
Most frequently used words and its frequency found in NEGATIVE reviews

This is how the results page looks like:

Results Page