Skip to content

The aim of the research project is to investigate the potential in using Apache Flink's strengths of parallelizing data streams, in order to anonymize streamed data according to the K-Anonymity and L-Diversity models.

License

Notifications You must be signed in to change notification settings

moritzmeister/flinkanonymity

Repository files navigation

K-Anonymity and L-diversity in Apache Flink

This is a research project developed by Moritz Meister and Philip Claesson at Politecnico di Milano.

About

The aim of the research project is to investigate the potential in using Apache Flink's strengths of parallelizing data streams, in order to anonymize streamed data according to the K-Anonymity and L-Diversity models.

Report

Find the working document of the final report here.

Approach

The main novel approach of this project is to use Apache Flink's functionality to key the incoming tuples by their Quasi Identifier. By doing so, all incoming tuples with the same Quasi Identifier end up in the same process. This approach can be advantageous when:

  • minimizing data entropy loss in the anonymization step
  • minimizing "late tuples" (rare tuples that are released with large delay due k tuples with same Quasi Identifier not appearing)

About

The aim of the research project is to investigate the potential in using Apache Flink's strengths of parallelizing data streams, in order to anonymize streamed data according to the K-Anonymity and L-Diversity models.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published