Skip to content

Latest commit

 

History

History
18 lines (13 loc) · 898 Bytes

README.md

File metadata and controls

18 lines (13 loc) · 898 Bytes

Data Information Quality Project on Completeness and Distinctness in Linear Regression

Welcome to the Data Information Quality Project!

This project focuses on exploring the impact of Completeness and Distinctness on Linear Regression models. We conduct 10 experiments for each data quality issue, assessing their effects on model performance.

Experiments Design

Completeness Experiments:
    1. Artificial dataset creation.
    2. Introduce completeness issues via pollution functions.
    3. Data preparation phase with various imputations.
    4. Analyze the impact on Linear Regression models before and after imputation.

Distinctness Experiments:
    1. Artificial dataset creation with distinctness variations.
    2. Pollution functions to introduce different levels of distinctness.
    3. Analyze the impact on various Linear Regression models.