Skip to content

IrfEazy/diq-project

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Information Quality Project on Completeness and Distinctness in Linear Regression

Welcome to the Data Information Quality Project!

This project focuses on exploring the impact of Completeness and Distinctness on Linear Regression models. We conduct 10 experiments for each data quality issue, assessing their effects on model performance.

Experiments Design

Completeness Experiments:
    1. Artificial dataset creation.
    2. Introduce completeness issues via pollution functions.
    3. Data preparation phase with various imputations.
    4. Analyze the impact on Linear Regression models before and after imputation.

Distinctness Experiments:
    1. Artificial dataset creation with distinctness variations.
    2. Pollution functions to introduce different levels of distinctness.
    3. Analyze the impact on various Linear Regression models.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 39.0%
  • TeX 33.7%
  • Python 27.3%