This repository contains the examples and exercises of the course Distributed Architectures for Big Data Processing and Analytics, for the Data Science and Engineering course at Politecnico di Torino. The course is mainly based on Hadoop Map Reduce techniques and an introduction to the Apache Spark framework.
The exercises and examples contain the following topics:
- Introduction to Apache Spark;
- RDD-based programs;
- Spark SQL and DataFrames;
- Data mining and Machine learning algorithms with Spark MLlib;
- GraphX/GraphFrames;
- Streaming data analytics.
A.Y. 2021/22