Skip to content

Performed ETL processes in the cloud to upload a dataframe to an RDS instance and used PySpark to perform a statistical analysis on Amazon datasets.

Notifications You must be signed in to change notification settings

alexhyasui1/amazon-vine-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Amazon Vine Review Analysis

An ETL Project with AWS RDS and Big Data Analysis in Google Colab with PySpark

Analyze whether reviews from Amazon's Vine program are trustworthy

Many of Amazon's shoppers depend on product reviews to make a purchase. Amazon makes these datasets publicly available. However, they are quite large and can exceed the capacity of local machines to handle. One dataset alone contains over 1.5 million rows; with over 40 datasets, this can be quite taxing on the average local computer.

The first goal for this assignment will be to perform the ETL process completely in the cloud (Google Colab) and upload a DataFrame to an RDS instance. The second goal will be to use PySpark or SQL to perform a statistical analysis of selected data.

Anaylized Files

Following Files are attached:

  1. AWS_software.ipynb: Level 1 ETL with Software Reviews

  2. AWS_videogame.ipynb: Level 1 ETL with Video Game Reivews

  3. vine_review_analysis.ipynb: Big Data Analysis on Vine Reviews

About

Performed ETL processes in the cloud to upload a dataframe to an RDS instance and used PySpark to perform a statistical analysis on Amazon datasets.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published