This is my submission for the Lab assignment 1 in Data mining class. This implementation consists of 2 sections: "Preprocessing" and "Introduction to numpy and pandas".
Main features:
- List attributes with missing data
- Fill out the missing data (by constant, mean, median or mode)
- Filter out rows and columns with a certain amount of missing data
- Remove duplicated rows
- Remove outliers
- Calculate point combinations
- Normalize data (min-max and Z-score)
- Apply One-hot encoding
- Calculate the correlations between any given pairs of numeric attributes, and create a heat map for visualization
- Use histogram charts to visualize the distribution of scores of every subjects