The Food.com dataset contains 700K+ reviews of 180K+ food recipes over 18 years of users uploads. The dataset can be found on Kaggle at the following url: https://www.kaggle.com/datasets/shuyangli94/food-com-recipes-and-user-interactions.
In this project we analyse the reviews using two very different NLP models: TextBlob and a pretrained model based on the popular BERT transformer, called DistilBERT.
This project is divided into three notebooks:
wordclouds.ipynb
, where Word Clouds like the one in the picture below are computed based on word frequencies.preprocess_dataset.ipynb
, where we perform a statistical exploration of the raw dataset and preprocess it in order to prepare it for the following analysis.sentiment_analysis.ipynb
, where a sentiment analysis is performed on the preprocessed dataset using the above two NLP models. Moreover, a classifier is trained to distinguish between positive and negative reviews based on the most frequent words contained in the reviews.