This is a research project developed by Moritz Meister and Philip Claesson at Politecnico di Milano.
The aim of the research project is to investigate the potential in using Apache Flink's strengths of parallelizing data streams, in order to anonymize streamed data according to the K-Anonymity and L-Diversity models.
Find the working document of the final report here.
The main novel approach of this project is to use Apache Flink's functionality to key the incoming tuples by their Quasi Identifier. By doing so, all incoming tuples with the same Quasi Identifier end up in the same process. This approach can be advantageous when:
- minimizing data entropy loss in the anonymization step
- minimizing "late tuples" (rare tuples that are released with large delay due k tuples with same Quasi Identifier not appearing)