Skip to content

Latest commit

 

History

History
19 lines (15 loc) · 673 Bytes

README.md

File metadata and controls

19 lines (15 loc) · 673 Bytes

Data Cleaning 101

In this repo, I have used the Kaggle Dataset to explore data preparation techniques.

Code

The missing_data_practice.ipynb notebook contains the code for the data preparation techniques.

Concepts

  • Missingness Types: Missing Completely at Random (MCAR), Missing at Random (MAR), Missing Not at Random (MNAR)
  • Univariate Imputation Techniques: Mean/Median/Mode Imputation, Random Sample Imputation
  • Multivariate Imputation Techniques: KNN Imputation, MICE Imputation

Pyhton libraries used:

  • pandas
  • numpy
  • matplotlib
  • missingno
  • fastimpute
  • sklearn