Skip to content

Files

Latest commit

 

History

History
14 lines (11 loc) · 678 Bytes

File metadata and controls

14 lines (11 loc) · 678 Bytes

Distributed Architectures for Big Data Processing and Analytics

This repository contains the examples and exercises of the course Distributed Architectures for Big Data Processing and Analytics, for the Data Science and Engineering course at Politecnico di Torino. The course is mainly based on Hadoop Map Reduce techniques and an introduction to the Apache Spark framework.

The exercises and examples contain the following topics:

  • Introduction to Apache Spark;
  • RDD-based programs;
  • Spark SQL and DataFrames;
  • Data mining and Machine learning algorithms with Spark MLlib;
  • GraphX/GraphFrames;
  • Streaming data analytics.

A.Y. 2021/22