Skip to content

Latest commit

 

History

History
10 lines (9 loc) · 805 Bytes

README.md

File metadata and controls

10 lines (9 loc) · 805 Bytes

data_analysis for imbalanced data

In this notebook, I applied statistical methods for imbalanced data analysis. In terms of basics, it starts with null check, data description and handling missing values. There exists right skewness in data for numerical columns. Shapiro-Wilk and Anderson darling tests are applied to prove that data is not distributed normally. Outlier detection with IGR is applied for numerical columns. Chi-square test is applied for categorical columns in order to test whether there exist differences between distributions for target columns. Correlation analysis for an imbalanced data set is applied by using undersampling methods.

Application of Shapiro-Wilk, Anderson Darling, Chi-square tests
Correlation analysis for imbalanced data
Outlier detection