- Francesco Porto f.porto2@campus.unimib.it (816042)
- Francesco Stranieri f.stranieri1@campus.unimib.it (816551)
- Mattia Vincenzi m.vincenzi14@campus.unimib.it (860579)
Record Linkage is the process of finding records in one or more datasets that refer to the same entity across different data sources. Traditionally, it is done by applying comparison rules between pairs of attributes from each dataset. In this project we investigate some possible Machine Learning applications to Record Linkage (and Data deduplication), in order to figure out their viability.
We provide:
- A Jupyter Notebook containing our project (code + step by step comments and explaination);
- A PDF relation we obtained from the notebook (we recommend just using the notebook since it might be easier to read);
- The slides to be shown during the project presentation;
- The datasets used are integrated into the library and therefore not provided, we give an in-depth description for each one in the notebook.