Amazon Elastic Map Reduce (EMR) Serverless Demonstration

This project showcases the utilization of Amazon EMR Serverless for running a sample Spark job to process semi-structured review data. The goal is to demonstrate the capabilities of Amazon EMR Serverless in efficiently processing and analyzing big data workloads. Overview Amazon EMR (Elastic MapReduce) Serverless is a serverless big data processing service that enables you to run Apache Spark applications without managing clusters. In this demonstration, we leverage EMR Serverless to process semi-structured review data stored in JSON format and derive insights from the analysis.

Project Structure

1. Scripts:

reviews.py: Python script for processing the review data.
script_arguments: Additional script arguments used during the EMR Serverless application setup.

2. Sample Dataset:

dataset_en_dev.json: Semi-structured review data in JSON format.

How to Use

1. Setup Amazon EMR Serverless:

Configure an S3 bucket to store output files and logs.
Create an IAM role with appropriate permissions for EMR Serverless.

2. Run Spark Job:

Execute the sample Spark job using Amazon EMR Serverless.
Provide necessary script arguments during application setup.

3. Analyze Data with Amazon Athena:

Link Amazon Athena to the output folder in the S3 bucket containing processed Parquet data.
Run SQL queries in Amazon Athena to analyze the processed data and derive insights.

Additional Resources

For detailed documentation and insights, refer to this project's documentation document link.
To replicate the project or explore the code, refer to this GitHub repository code section.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
Policy_Files		Policy_Files
Project's_Code		Project's_Code
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Amazon Elastic Map Reduce (EMR) Serverless Demonstration

Project Structure

How to Use

Additional Resources

About

Languages

License

kevinndungu-source/Amazon_EMR_Serverless_Demonstration

Folders and files

Latest commit

History

Repository files navigation

Amazon Elastic Map Reduce (EMR) Serverless Demonstration

Project Structure

How to Use

Additional Resources

About

Topics

Resources

License

Stars

Watchers

Forks

Languages