Analyzing Amazon reviews written by members of the paid Amazon Vine program. The Amazon Vine program is a service that allows manufacturers and publishers to receive reviews for their products. Companies pay a small fee to Amazon and provide products to Amazon Vine members, who are then required to publish a review.
use PySpark to perform the ETL process to extract the dataset, transform the data, connect to an AWS RDS instance, and load the transformed data into pgAdmin. Next, use PySpark to determine if there is any bias toward favorable reviews from Vine members in the selected dataset. Then, write a summary of the analysis to submit to company stakeholders.
Python PySpark and Pandas, AWS RDS and S3 services, SQL and pgAdmin
Vine reviews and percentage of 5-star reviews
non-vine reviews and percentage of 5-star reviews
Our data analysis result showed that 57% of the reviews in the Vine program were 5 stars reviews out of 285 total reviews whereas the percentage in the non-Vine program reviews is 46% out of 31545 reviews. Comparison of percentages reveales that a positivity bias for reviews in the Vine program. Additionally we could analyse one-way ANNOVA test for 5-star rating for the Vine and non-Vine reviews to see whether the percentage of difference is statistically significant or not.