Data Information Quality Project on Completeness and Distinctness in Linear Regression

Welcome to the Data Information Quality Project!

This project focuses on exploring the impact of Completeness and Distinctness on Linear Regression models. We conduct 10 experiments for each data quality issue, assessing their effects on model performance.

Experiments Design

Completeness Experiments:
    1. Artificial dataset creation.
    2. Introduce completeness issues via pollution functions.
    3. Data preparation phase with various imputations.
    4. Analyze the impact on Linear Regression models before and after imputation.

Distinctness Experiments:
    1. Artificial dataset creation with distinctness variations.
    2. Pollution functions to introduce different levels of distinctness.
    3. Analyze the impact on various Linear Regression models.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
Report		Report
library		library
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
completeness.ipynb		completeness.ipynb
distinctness.ipynb		distinctness.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Information Quality Project on Completeness and Distinctness in Linear Regression

About

Releases

Packages

Languages

License

IrfEazy/diq-project

Folders and files

Latest commit

History

Repository files navigation

Data Information Quality Project on Completeness and Distinctness in Linear Regression

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages