This project is a part of the coursework for Text Analytics at the University of Texas at Austin. Here we scraped Beeradvocate.com and implemented various text mining concepts to analyze the reviews on various craft beers and build a recommendation system.
Beeradvocate.com is an online forum for beers. People use this website to post reviews about their experience with various beers as well as provide ratings for the beers. The objective of the project was to create the building blocks of a crowdsourced recommendation system. The recommendation system was required to accept user inputs about desired attributes of a product and come up with 3 recommendations.
We scraped around 6000 reviews about various craft beers from BeerAdvocate using Selenium. The scraped data can be found in data.csv
In this project the following steps were taken:
- Write a scraper using Selenium on python to fetch posts from Beeradvocate.com
- Identify 3 attributes assuming that a customer who will be using this recommendation system has specified 3 attributes in a beer
- Perform a similarity analysis with the 3-attribute set and the reviews using SPACY and choose 300 reviews that have the highest similarity scores.
- Perform sentiment analysis using VADER on these 300 reviews and sort them by the sentiment scores.
- Based on the above steps, recommend 3 beers to the customer.
- Identify the highest rated beers by calculating average ratings for each beer and ignoring the similarity and sentiment scores.
- Scraping using Selenium
- Word Frequency analysis using NLTK
- Similarity Analysis using SPACY
- Sentiment Analysis using VADER
1. Attributes chosen
Chocolate: darker, more aromatic malt, roasted or kilned
Dark: longer roast, longer brew process or barrel-aged
Heady: unpasteurized and unfiltered, many flavors on palate
2. Top Beers with chosen attributes
3. Top Beers with highest sentiment scores
See notebook for further analysis.