This repository has been archived by the owner on Jul 16, 2022. It is now read-only.

BloomFilter-MapReduce

Project developed for the Cloud Computing course of the Master of Artificial Intelligence and Data Engineering at the University of Pisa.

This project consists in the design and implementation of a Bloom Filter for IMDb datasets using MapReduce (Hadoop and Spark frameworks).

Repository

The repository is organized as follows:

dataset/ contains the IMDb dataset stored in film_ratings.txt
docs/ contains the report and the assignment
hadoop/ contains the Hadoop implementation and test
results/ contains testing results and analysis
spark/ contains the Spark implementation and test

Contributors

Francesca Pezzuti @frax1819
Francesco Hudema @MrFransis
Tommaso Baldi @balditommaso
Edoardo Ruffoli @edoardoruffoli