Welcome to the Data Information Quality Project!
This project focuses on exploring the impact of Completeness and Distinctness on Linear Regression models. We conduct 10 experiments for each data quality issue, assessing their effects on model performance.
Experiments Design
Completeness Experiments:
1. Artificial dataset creation.
2. Introduce completeness issues via pollution functions.
3. Data preparation phase with various imputations.
4. Analyze the impact on Linear Regression models before and after imputation.
Distinctness Experiments:
1. Artificial dataset creation with distinctness variations.
2. Pollution functions to introduce different levels of distinctness.
3. Analyze the impact on various Linear Regression models.