Analyze whether reviews from Amazon's Vine program are trustworthy
Many of Amazon's shoppers depend on product reviews to make a purchase. Amazon makes these datasets publicly available. However, they are quite large and can exceed the capacity of local machines to handle. One dataset alone contains over 1.5 million rows; with over 40 datasets, this can be quite taxing on the average local computer.
The first goal for this assignment will be to perform the ETL process completely in the cloud (Google Colab) and upload a DataFrame to an RDS instance. The second goal will be to use PySpark or SQL to perform a statistical analysis of selected data.
Following Files are attached:
-
AWS_software.ipynb: Level 1 ETL with Software Reviews
-
AWS_videogame.ipynb: Level 1 ETL with Video Game Reivews
-
vine_review_analysis.ipynb: Big Data Analysis on Vine Reviews