Spark Mini Project

Automobile post-sales report

NOTE: This mini project is the continuation of the Hadoop mini project.

Consider an automobile tracking platform that keeps track of history of incidents after a new vehicle is sold by the dealer. Such incidents include further private sales, repairs and accident reports. This provides a good reference for second hand buyers to understand the vehicles they are interested in.

The same dataset of a history report of various vehicles is provided. Your goal is to write a Spark job to produce a report of the total number of accidents per make and year of the car.

Objectives

With this mini project, you will exercise using Spark transformations to solve traditional MapReduce data problems. It demonstrates Spark having a significant advantage against Hadoop MapReduce framework, in both code simplicity and its in-memory processing performance, which best suit for the chain of MapReduce use cases.

Step 1

From you Local Terminal run upload_spark_files.sh to upload to the root directory in the VirtualBox:

Step 2:

From the SandBox's Web Shell Client, run command:

$ spark-submit AutoSaleExtractSpark.py

Step 4:

Double check the final_output.txt file in the SandBox's Web Shell Client:

$ cat final_output.txt

Step 5:

Download final_output.txt file from SandBox's Web Shell Client's root directory. From local Terminal, run below command:

$ scp -P 2222 root@127.0.0.1:/root/final_output.txt [local directory]

Reference:

Difference between groupByKey() and reduceByKey() in Spark Link

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
AutoSaleExtractSpark.py		AutoSaleExtractSpark.py
CarSaleETL.py		CarSaleETL.py
README.md		README.md
final_output.txt		final_output.txt
spark_CLI.txt		spark_CLI.txt
upload_spark_files.sh		upload_spark_files.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spark Mini Project

Automobile post-sales report

Objectives

Step 1

Step 2:

Step 4:

Step 5:

About

Releases

Packages

Languages

Andy-Pham-72/spark-mini-project

Folders and files

Latest commit

History

Repository files navigation

Spark Mini Project

Automobile post-sales report

Objectives

Step 1

Step 2:

Step 4:

Step 5:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages