Movie Rating Analysis using Apache Spark (pyspark)
-
Updated
Nov 8, 2023 - Jupyter Notebook
Movie Rating Analysis using Apache Spark (pyspark)
Data is fetched from StackExchange, transformed using Pig, queried and stored in Hive. Additionally, the TF-IDF of the top 10 users is calculated using Hive.
Car Insurance Cold Calls Data Analysis using Apache Hive
Project for Scalable and Cloud Programming Course - 2018/19 UNIBO.
Marketing Campaign Data Analysis Using Apache Spark (PySpark)
First project for Big Data course held at Roma Tre University
Hadoop Google DataProc DIO study
Process large amount of data and implement complex data analyses using Spark. The dataset has been made available by Google. It includes data about a cluster of 12500 machines, and the activity on this cluster during 29 days.
Project for Cloud Computing course (A.Y. 2018/2019)
Apache spark sandbox on GCP and Amazon EMR.
Implements a work queue for Dataproc Worflow Template executions
Monte Carlo stock simulation using Apache Spark.
ecommerce GCP Streaming pipeline ― Cloud Storage, Compute Engine, Pub/Sub, Dataflow, Apache Beam, BigQuery and Tableau; GCP Batch pipeline ― Cloud Storage, Dataproc, PySpark, Cloud Spanner and Tableau
Add a description, image, and links to the gcp-dataproc topic page so that developers can more easily learn about it.
To associate your repository with the gcp-dataproc topic, visit your repo's landing page and select "manage topics."